SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Downloaden Sie, um offline zu lesen
Slope One Recommender
             on Hadoop
                     YONG ZHENG
         Center for Web Intelligence
                  DePaul University
                      Nov 15, 2012
Overview
• Introduction

• Recommender Systems & Slope One Recommender

• Distributed Slope One on Mahout and Hadoop

• Experimental Setup and Analyses

• Drive Mahout on Hadoop

• Interesting Communities




                            Center for Web Intelligence, DePaul University, USA
Introduction
• About Me: a recommendation guy

• My Research: data mining and recommender systems

• Typical Experimental Research

   1)   Design or improve an algorithm;
   2)   Run algorithms and baseline algs on datasets;
   3)   Compare experimental results;
   4)   Try different parameters, find reasons and even re-design
        and improve algorithm itself;
   5)   Run algorithms and baseline algs on datasets;
   6)   Compare experimental results;
   7)   Try different parameters, find reasons and even re-design
        and improve algorithm itself;
   8)   And so on… Until it approaches expected results.
Introduction
• Sometimes, data is large-scale.
  e.g. one algorithm may spend days to complete, how
  about experimental results are not as expected. Then
  improve algorithms and run it for days again, and again.

  How can we do previously? (for tasks not that complicated)
  1). Paralleling but complicated synchronization and limited
      resources, such as CPU, memory, etc;
  2). Take advantage of PC Labs, let’s do it with 10 PCs




• Nearly all research will ultimately face the large-scale
  problems , especially in the domain of data mining.

• But, we have Map-Reduce NOW!
Introduction



• Do not need to distribute data and tasks manually.
  Instead we just simply generate configurations.
• Do not need to care about more details, e.g. how data is
  distributed, when one specific task will be ran on which
  machine, or how they conduct tasks one by one.
• Instead, we can pre-define working flow. We can take
  advantage of the functional contributions from mappers
  and reducers.
• More benefits: replication, balancing, robustness, etc
Recommender Systems

• Collaborative Filtering

• Slope One and Simple Weighted Slope One

• Slope One in Mahout

• Distributed Slope One in Mahout

• Mappers and Reducers




                            Center for Web Intelligence, DePaul University, USA
Recommender Systems
Collaborative Filtering (CF)
One of most popular recommendation algorithms.
 User-based: User-CF
 Item-based: Item-CF, Slope One


                          User 5
          Rating?
                                 5

                             4
             4
        4 star
                             5



 Example: User-based Collaborative Filtering
Slope One Recommender
Reference: Daniel Lemire, Anna Maclachlan, Slope One Predictors for
Online Rating-Based Collaborative Filtering, In SIAM Data Mining
(SDM'05), April 21-23, 2005. http://lemire.me/fr/abstracts/SDM2005.html

            User                 Batman              Spiderman
             U1                     3                      4
             U2                     2                      4
             U3                     2                      ?

1). How different two movies were rated?
U1 rated Spiderman higher by (4-3) = 1
U2 rated Spiderman higher by (4-2) = 2
On average, Spiderman is rated (1+2)/2 = 1.5 higher

2). Rating difference can tell predictions
If we know U3 gave Batman a 2-star, probably he will rated
Spiderman by (2+1.5) = 3.5 star
Simple Weighted Slope One
Usually user rated multiple items
        User        HarryPotter       Batman       Spiderman
         U1              5               3              4
         U2              ?               2              4
         U3              4               2              ?

1). How different the two movies were rated?
Diff(Batman, Spiderman) = [(4-3)+(4-2)]/2 = 1.5
Diff(HarryPotter, Spiderman) = (4-5)/1 = -1
“2” and “1” here we call them as “count”.

2). Weighted rating difference can tell predictions
We use a simple weighted approach
Refer to Batman only, rating = 2+1.5 = 3.5
Refer to HarryPotter only, rating = 4-1 = 3
Consider them all, predicted rating = (3.5*2 + 3*1])/ (2+1) = 3.33
Simple Weighted Slope One
      User      HarryPotter       Batman        Spiderman
       u1               5            3             4
       u2               ?            2             4
       u3               4            2             ?
            Question: Online or Offline?
To calculate the prediction ratings, we need 2 matrices:
1).Difference Matrix
               Movie1       Movie2       Movie3        Movie4
     Movie1
     Movie2      -1.5
     Movie3       2           1
     Movie4      -1           0.5          -2

2). Count Matrix
Just number of users co-rated on two items
Slope One in Mahout
Mahout, an open-source machine learning library.

1). Recommendation algorithms
   User-based CF, Item-based CF, Slope One, etc

2). Clustering
   KMeans, Fuzzy KMeans, etc

3). Classification
   Decision Trees, Naive Bayes, SVM, etc

4). Latent Factor Models
   LDA, SVD, Matrix Factorization, etc
Slope One in Mahout
org.apache.mahout.cf.taste.impl.recommender.slopeone.SlopeOneRecommender
Pre-Processing Stage: (class MemoryDiffStorage with Map)
for every item i
   for every other item j
     for every user u expressing preference for both i and j
      add the difference in u’s preference for i and j to an average

Recommendation Stage:
for every item i the user u expresses no preference for
   for every item j that user u expresses a preference for
     find the average preference difference between j and i
     add this diff to u’s preference value for j
     add this to a running average
return the top items, ranked by these averages

Simple weighting: as introduced previously
StdDev weighting: item-item rating diffs with lower sd should be
                  weighted highly
Distributed Slope One in Mahout
Similar to our previous practice, e.g. the matrix factorization
Process, what we need is the Difference Matrix.

Suppose there are M users rated N items, the matrix
requires N(N-1)/2 cells. Also, the density is another aspect
– how user rated items. If there are several items and the
rating matrix is dense, the computational costs will increase
accordingly.

Question again: Online or Offline?
Depends on tasks & data.

Large-scale data. Let’s do it offline!
Distributed Slope One in Mahout
package org.apache.mahout.cf.taste.hadoop.slopeone;
      class SlopeOneAverageDiffsJob
      class SlopeOnePrefsToDiffsReducer
      class SlopeOneDiffsToAveragesReducer

package org.apache.mahout.cf.taste.hadoop;
      class ToItemPrefsMapper
      org.apache.hadoop.mapreduce.Mapper

Two Mapper-Reducer Stages:
      1). Create DiffMatrix for each user
      2). Collect AvgDiff info, counts, StdDev

Let’s see how it works…
Mapper and Reducer - 1
          User      HarryPotter        Batman        Spiderman
          U1              5               3              4
          U2              ?               2              4
          U3              4               2              ?

 Mapper1 (ToItemPrefsMapper)
  <UserID, Pair<ItemID, Rating>>
 Reducer1 (PrefsToDiffsReducer)
  <Pair<Item1,Item2>, Diff> (for all three users)

 <U1>      Potter   Bat       Spider   <U2>     Potter   Bat   Spider

 Potter                                Potter

  Bat          -2                       Bat     NULL

 Spider        -1    1                 Spider   NULL     2
Mapper and Reducer - 2
 <U1>     Potter    Bat     Spider   <U2>      Potter   Bat   Spider

Potter                               Potter

 Bat           -2                     Bat      NULL

Spider         -1   1                Spider    NULL     2

Mapper2 (org.apache.hadoop.mapreduce.Mapper)
Reducer2 (DiffsToAveragesReducer)
Average Diffs, Count, StedDev
  <Aggregate>             Potter         Bat             Spider
       Potter
         Bat              -2, 1
       Spider             -1, 1         1.5, 2
Simply, <a,b> pair denotes a=averge diff, b=count
Notice: we should use three matrices in practice, here I used 2.
Predictions
        User        HarryPotter      Batman        Spiderman
         U1              5              3               4
         U2              ?              2               4
         U3              4              2               ?

  <Aggregate>          Potter            Bat             Spider
      Potter
       Bat              -2, 1
      Spider            -1, 1           1.5, 2
 Simply, <a,b> pair denotes a=averge diff, b=count
 Notice: we should use three matrices in practice, here I used 2.


 Prediction(U3, Spiderman) = [(4-1)*1 + (2+1.5)*2] / (1+2)
                           = 3.33333333333333333333
Experiments

• Data

• Hadoop Setup

• Running Performances




                         Center for Web Intelligence, DePaul University, USA
Experiment Setup
Data: MovieLens-1M ratings
       # of users:     6,040
       # of movies:    3,900
       # of ratings:   1,000,209

Density of the ratings:
       each user has at least 20 ratings
       obviously, some users have many more ratings

Rating format: UserID, ItemID, Rating (scale 1-5)

Data Split: 80% training, 20% testing
Experiment Setup
Hadoop Cluster Setup
 IBM SmartCloud
 1 master node, 7 slave nodes
 Each node is as SUSE Linux Enterprise Server v11 SP1
 Server Configuration:
  64 bit (vCPU: 2, RAM: 4 GiB, Disk: 60 GiB)
 Hadoop v.0.20.205.0
 Mahout distribution-0.6

The environment setup follows the typical workflow as:
http://irecsys.blogspot.com/2012/11/configurate-map-reduce-
environment-on.html

Thanks Scott Young, neat writeup!!
Experimental Analyses
Stage-1: SlopeOneAverageDiffsJob by Map-Reduce
         Goal: Build DiffStorage
         Output: DiffStorage txt file, 1.45GB
         Running Time:
            real 13m 34.228s
            user 0m 5.136s
            sys      0m 1.028s
        Item1     Item2     Diff     Count    StdDev
         221      223       -1.02     197       0.5
Stage-2: Java evaluator to measure MAE on testing set
         Running Time:
            Load Testing Set (21K records), 299ms
            Load Training Set (79K records), 1,771ms
            Load DiffStorage, 176,352ms = 2.9m
            Prediction (21K records), 18,182ms = 0.3m
            MAE = 0.71330756
Experimental Experiences
1. Why not MovieLens 10M data?
   Map-Reduce on 10M data may cost several hrs;
   Running time depends on cluster and configuration;
   Also, DiffStorage file will be too large.
2. Java Evaluator
    Load full DiffStorage file is time-consuming.
   Also, incur Java heap space and GCOverlimit errors;
    Those errors can not be fixed by –Xmx or other solutions;
    Two solutions:
     1). Just use simple weighting, discard StdDev weighting.
     2). Simple Mapper and Reducer, run it on clusters.

   For MovieLens 1M, it is not that efficient compared with
   the live SlopeOne recommendation; 10M data may be
   better, will try MovieLens-10M data later; Slope One is
   simple but memory-expensive.
More …

• Drive Mahout on Hadoop

• Interesting Communities




                            Center for Web Intelligence, DePaul University, USA
Mahout + Hadoop
How to put more Mahout algorithms to Hadoop?
1. Pre-set Command in Mahout
  Let’s see bin/mahout – help, then it provides a list of
  available programs such as svd, fkmeans, etc.

  Some are basic functions, such as splitDataset
  Some can be executed as Hadoop tasks

  e.g. Run and evaluate Matrix Factorization on rating dataset

  bin/mahout parallelALS --input inputSource --output outputSource
  --tempDir tmpFolder --numFeatures 20 --numIterations 10

  bin/mahout evaluateFactorization --input inputSource --output
  outputSource --userFeatures als/out/U/ --itemFeatures als/out/M/
  --tempDir tmpFolder
Mahout + Hadoop
2. More Algorithms on Hadoop
  Mahout provides a way to run more Mahout algorithms. Simply,

$HADOOP_HOME/bin/hadoop jar $MAHOUT_HOME/core/target/mahout-core-
<version>.jar <Job Class> --recommenderClassName Class <OPTIONS>

   Which kinds of Jobs it supports? Mahout implemented some versions.




   Some popular ones:
   1).org.apache.mahout.cf.taste.hadoop.pseudo.RecommenderJob
        --recommenderClassName ClassName
   2).org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
   3).org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJob
   4).org.apache.mahout.cf.taste.hadoop.slopeone.SlopeOneAverageDiffsJob
Interesting Communities
Beyond Hadoop and Mahout official sites

1. Data Mining
  KDnuggets, http://www.kdnuggets.com
  Popular community for Data Mining & Analytics. Lots of useful
  information, such as news, materials, datasets, jobs, etc.

2. Big Data
  SmartData Collective, http://smartdatacollective.com/
  Smarter Computing, http://www.smartercomputingblog.com/
  Big Data Meetup, http://big-data.meetup.com/

3. Recommender Systems
  ACM Official Site, http://recsys.acm.org/
  RecSys Wiki, http://recsyswiki.com/
Thank You!


      Center for Web Intelligence, DePaul University, USA

Weitere ähnliche Inhalte

Was ist angesagt?

Scalable community detection with the louvain algorithm
Scalable community detection with the louvain algorithmScalable community detection with the louvain algorithm
Scalable community detection with the louvain algorithmNavid Sedighpour
 
Community Detection in Social Media
Community Detection in Social MediaCommunity Detection in Social Media
Community Detection in Social MediaSymeon Papadopoulos
 
Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014 Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014 Cataldo Musto
 
EXTRA: Integrating External Knowledge into Multimodal Hashtag Recommendation ...
EXTRA: Integrating External Knowledge into Multimodal Hashtag Recommendation ...EXTRA: Integrating External Knowledge into Multimodal Hashtag Recommendation ...
EXTRA: Integrating External Knowledge into Multimodal Hashtag Recommendation ...ssuser610732
 
[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systems[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systemsFalitokiniaina Rabearison
 
Beyond Churn Prediction : An Introduction to uplift modeling
Beyond Churn Prediction : An Introduction to uplift modelingBeyond Churn Prediction : An Introduction to uplift modeling
Beyond Churn Prediction : An Introduction to uplift modelingPierre Gutierrez
 
Image Classification And Support Vector Machine
Image Classification And Support Vector MachineImage Classification And Support Vector Machine
Image Classification And Support Vector MachineShao-Chuan Wang
 
WEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEMWEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEMSai Kumar Ale
 
Facebook's TAO & Unicorn data storage and search platforms
Facebook's TAO & Unicorn data storage and search platformsFacebook's TAO & Unicorn data storage and search platforms
Facebook's TAO & Unicorn data storage and search platformsNitish Upreti
 
DIABETES PREDICTION SYSTEM .pptx
DIABETES PREDICTION SYSTEM .pptxDIABETES PREDICTION SYSTEM .pptx
DIABETES PREDICTION SYSTEM .pptxHome
 
Diabetes prediction using machine learning
Diabetes prediction using machine learningDiabetes prediction using machine learning
Diabetes prediction using machine learningdataalcott
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introductionLiang Xiang
 
Recommendation system
Recommendation systemRecommendation system
Recommendation systemAkshat Thakar
 
Predicting Diabetes Using Machine Learning
Predicting Diabetes Using Machine LearningPredicting Diabetes Using Machine Learning
Predicting Diabetes Using Machine LearningJohn Alex
 
Information Retrieval Models for Recommender Systems - PhD slides
Information Retrieval Models for Recommender Systems - PhD slidesInformation Retrieval Models for Recommender Systems - PhD slides
Information Retrieval Models for Recommender Systems - PhD slidesDaniel Valcarce
 
ML - Multiple Linear Regression
ML - Multiple Linear RegressionML - Multiple Linear Regression
ML - Multiple Linear RegressionAndrew Ferlitsch
 
Win runner testing tool
Win runner testing toolWin runner testing tool
Win runner testing toolmansirajpara
 

Was ist angesagt? (20)

Scalable community detection with the louvain algorithm
Scalable community detection with the louvain algorithmScalable community detection with the louvain algorithm
Scalable community detection with the louvain algorithm
 
Community Detection in Social Media
Community Detection in Social MediaCommunity Detection in Social Media
Community Detection in Social Media
 
Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014 Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014
 
EXTRA: Integrating External Knowledge into Multimodal Hashtag Recommendation ...
EXTRA: Integrating External Knowledge into Multimodal Hashtag Recommendation ...EXTRA: Integrating External Knowledge into Multimodal Hashtag Recommendation ...
EXTRA: Integrating External Knowledge into Multimodal Hashtag Recommendation ...
 
[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systems[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systems
 
Beyond Churn Prediction : An Introduction to uplift modeling
Beyond Churn Prediction : An Introduction to uplift modelingBeyond Churn Prediction : An Introduction to uplift modeling
Beyond Churn Prediction : An Introduction to uplift modeling
 
Image Classification And Support Vector Machine
Image Classification And Support Vector MachineImage Classification And Support Vector Machine
Image Classification And Support Vector Machine
 
WEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEMWEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEM
 
Web crawler
Web crawlerWeb crawler
Web crawler
 
Facebook's TAO & Unicorn data storage and search platforms
Facebook's TAO & Unicorn data storage and search platformsFacebook's TAO & Unicorn data storage and search platforms
Facebook's TAO & Unicorn data storage and search platforms
 
DIABETES PREDICTION SYSTEM .pptx
DIABETES PREDICTION SYSTEM .pptxDIABETES PREDICTION SYSTEM .pptx
DIABETES PREDICTION SYSTEM .pptx
 
Diabetes prediction using machine learning
Diabetes prediction using machine learningDiabetes prediction using machine learning
Diabetes prediction using machine learning
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introduction
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
Predicting Diabetes Using Machine Learning
Predicting Diabetes Using Machine LearningPredicting Diabetes Using Machine Learning
Predicting Diabetes Using Machine Learning
 
Information Retrieval Models for Recommender Systems - PhD slides
Information Retrieval Models for Recommender Systems - PhD slidesInformation Retrieval Models for Recommender Systems - PhD slides
Information Retrieval Models for Recommender Systems - PhD slides
 
ML - Multiple Linear Regression
ML - Multiple Linear RegressionML - Multiple Linear Regression
ML - Multiple Linear Regression
 
Win runner testing tool
Win runner testing toolWin runner testing tool
Win runner testing tool
 
Relation Extraction
Relation ExtractionRelation Extraction
Relation Extraction
 

Andere mochten auch

American Express OPEN: Announcing the new OPEN Forum by Scott Roen
American Express OPEN: Announcing the new OPEN Forum by Scott RoenAmerican Express OPEN: Announcing the new OPEN Forum by Scott Roen
American Express OPEN: Announcing the new OPEN Forum by Scott RoenLinkedIn
 
The good the bad and the ugly - final
The good the bad and the ugly - finalThe good the bad and the ugly - final
The good the bad and the ugly - finalAndre Verschelling
 
Des maths et des recommandations - Devoxx 2014
Des maths et des recommandations - Devoxx 2014Des maths et des recommandations - Devoxx 2014
Des maths et des recommandations - Devoxx 2014Loïc Knuchel
 
PhD defense - Exploiting distributional semantics for content-based and conte...
PhD defense - Exploiting distributional semantics for content-based and conte...PhD defense - Exploiting distributional semantics for content-based and conte...
PhD defense - Exploiting distributional semantics for content-based and conte...Victor Codina
 
Le temps réel au coeur de toutes les stratégies digitales
Le temps réel au coeur de toutes les stratégies digitales Le temps réel au coeur de toutes les stratégies digitales
Le temps réel au coeur de toutes les stratégies digitales Netwave
 
Case Study Amex
Case Study AmexCase Study Amex
Case Study AmexFM Signal
 
Example: movielens data with mahout
Example: movielens data with mahoutExample: movielens data with mahout
Example: movielens data with mahoutGregg Barrett
 
American Express Case Study
American Express Case StudyAmerican Express Case Study
American Express Case StudyShivani Chavan
 
Recommendation Engine using Apache Mahout
Recommendation Engine using Apache MahoutRecommendation Engine using Apache Mahout
Recommendation Engine using Apache MahoutAmbarish Hazarnis
 
Mahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud PlatformMahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud PlatformIMC Institute
 
Strategic Brand Assessment - Amex
Strategic Brand Assessment - AmexStrategic Brand Assessment - Amex
Strategic Brand Assessment - AmexHenry Jenkins
 
Big Data Analytics using Mahout
Big Data Analytics using MahoutBig Data Analytics using Mahout
Big Data Analytics using MahoutIMC Institute
 
The Best in Financial Services Content Marketing
The Best in Financial Services Content MarketingThe Best in Financial Services Content Marketing
The Best in Financial Services Content MarketingNewsCred
 
American express case study
American express case studyAmerican express case study
American express case studyChinmoy Nanda
 
Strategic management - lowes home improvement case study
Strategic management - lowes home improvement case studyStrategic management - lowes home improvement case study
Strategic management - lowes home improvement case studySarah Lee
 
[IUI 2017] Criteria Chains: A Novel Multi-Criteria Recommendation Approach
[IUI 2017] Criteria Chains: A Novel Multi-Criteria Recommendation Approach[IUI 2017] Criteria Chains: A Novel Multi-Criteria Recommendation Approach
[IUI 2017] Criteria Chains: A Novel Multi-Criteria Recommendation ApproachYONG ZHENG
 
Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...
Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...
Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...Alan Said
 
The Good, Bad and Ugly of Serverless
The Good, Bad and Ugly of ServerlessThe Good, Bad and Ugly of Serverless
The Good, Bad and Ugly of ServerlessPipedrive
 
Yrecommender, machine learning sur Hybris
Yrecommender, machine learning sur HybrisYrecommender, machine learning sur Hybris
Yrecommender, machine learning sur HybrisGuillaume Kpotufe
 

Andere mochten auch (20)

American Express OPEN: Announcing the new OPEN Forum by Scott Roen
American Express OPEN: Announcing the new OPEN Forum by Scott RoenAmerican Express OPEN: Announcing the new OPEN Forum by Scott Roen
American Express OPEN: Announcing the new OPEN Forum by Scott Roen
 
The good the bad and the ugly - final
The good the bad and the ugly - finalThe good the bad and the ugly - final
The good the bad and the ugly - final
 
Des maths et des recommandations - Devoxx 2014
Des maths et des recommandations - Devoxx 2014Des maths et des recommandations - Devoxx 2014
Des maths et des recommandations - Devoxx 2014
 
PhD defense - Exploiting distributional semantics for content-based and conte...
PhD defense - Exploiting distributional semantics for content-based and conte...PhD defense - Exploiting distributional semantics for content-based and conte...
PhD defense - Exploiting distributional semantics for content-based and conte...
 
Le temps réel au coeur de toutes les stratégies digitales
Le temps réel au coeur de toutes les stratégies digitales Le temps réel au coeur de toutes les stratégies digitales
Le temps réel au coeur de toutes les stratégies digitales
 
Case Study Amex
Case Study AmexCase Study Amex
Case Study Amex
 
Example: movielens data with mahout
Example: movielens data with mahoutExample: movielens data with mahout
Example: movielens data with mahout
 
American Express Case Study
American Express Case StudyAmerican Express Case Study
American Express Case Study
 
Recommendation Engine using Apache Mahout
Recommendation Engine using Apache MahoutRecommendation Engine using Apache Mahout
Recommendation Engine using Apache Mahout
 
Mahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud PlatformMahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud Platform
 
Strategic Brand Assessment - Amex
Strategic Brand Assessment - AmexStrategic Brand Assessment - Amex
Strategic Brand Assessment - Amex
 
Big Data Analytics using Mahout
Big Data Analytics using MahoutBig Data Analytics using Mahout
Big Data Analytics using Mahout
 
The Best in Financial Services Content Marketing
The Best in Financial Services Content MarketingThe Best in Financial Services Content Marketing
The Best in Financial Services Content Marketing
 
Big Data en Retail
Big Data en RetailBig Data en Retail
Big Data en Retail
 
American express case study
American express case studyAmerican express case study
American express case study
 
Strategic management - lowes home improvement case study
Strategic management - lowes home improvement case studyStrategic management - lowes home improvement case study
Strategic management - lowes home improvement case study
 
[IUI 2017] Criteria Chains: A Novel Multi-Criteria Recommendation Approach
[IUI 2017] Criteria Chains: A Novel Multi-Criteria Recommendation Approach[IUI 2017] Criteria Chains: A Novel Multi-Criteria Recommendation Approach
[IUI 2017] Criteria Chains: A Novel Multi-Criteria Recommendation Approach
 
Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...
Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...
Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...
 
The Good, Bad and Ugly of Serverless
The Good, Bad and Ugly of ServerlessThe Good, Bad and Ugly of Serverless
The Good, Bad and Ugly of Serverless
 
Yrecommender, machine learning sur Hybris
Yrecommender, machine learning sur HybrisYrecommender, machine learning sur Hybris
Yrecommender, machine learning sur Hybris
 

Ähnlich wie Slope one recommender on hadoop

Session-Based Recommendations with Recurrent Neural Networks (Balazs Hidasi, ...
Session-Based Recommendations with Recurrent Neural Networks(Balazs Hidasi, ...Session-Based Recommendations with Recurrent Neural Networks(Balazs Hidasi, ...
Session-Based Recommendations with Recurrent Neural Networks (Balazs Hidasi, ...hyunsung lee
 
Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013Sri Ambati
 
Productive Use of the Apache Spark Prompt with Sam Penrose
Productive Use of the Apache Spark Prompt with Sam PenroseProductive Use of the Apache Spark Prompt with Sam Penrose
Productive Use of the Apache Spark Prompt with Sam PenroseDatabricks
 
Benchmarking Perl (Chicago UniForum 2006)
Benchmarking Perl (Chicago UniForum 2006)Benchmarking Perl (Chicago UniForum 2006)
Benchmarking Perl (Chicago UniForum 2006)brian d foy
 
Deliberately Planning and Acting for Angry Birds with Refinement Methods
Deliberately Planning and Acting for Angry Birds with Refinement MethodsDeliberately Planning and Acting for Angry Birds with Refinement Methods
Deliberately Planning and Acting for Angry Birds with Refinement MethodsRuofei Du
 
Data Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupData Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupDoug Needham
 
Cloudera Data Science Challenge
Cloudera Data Science ChallengeCloudera Data Science Challenge
Cloudera Data Science ChallengeMark Nichols, P.E.
 
Reward constrained interactive recommendation with natural language feedback ...
Reward constrained interactive recommendation with natural language feedback ...Reward constrained interactive recommendation with natural language feedback ...
Reward constrained interactive recommendation with natural language feedback ...Jeong-Gwan Lee
 
[SOCRS2013]Differential Context Modeling in Collaborative Filtering
[SOCRS2013]Differential Context Modeling in Collaborative Filtering[SOCRS2013]Differential Context Modeling in Collaborative Filtering
[SOCRS2013]Differential Context Modeling in Collaborative FilteringYONG ZHENG
 
Face recognition v1
Face recognition v1Face recognition v1
Face recognition v1San Kim
 
Using CNTK's Python Interface for Deep LearningDave DeBarr -
Using CNTK's Python Interface for Deep LearningDave DeBarr - Using CNTK's Python Interface for Deep LearningDave DeBarr -
Using CNTK's Python Interface for Deep LearningDave DeBarr - PyData
 
MapReduce: teoria e prática
MapReduce: teoria e práticaMapReduce: teoria e prática
MapReduce: teoria e práticaPET Computação
 
Benchmarking Perl Lightning Talk (NPW 2007)
Benchmarking Perl Lightning Talk (NPW 2007)Benchmarking Perl Lightning Talk (NPW 2007)
Benchmarking Perl Lightning Talk (NPW 2007)brian d foy
 
Implementation of linear regression and logistic regression on Spark
Implementation of linear regression and logistic regression on SparkImplementation of linear regression and logistic regression on Spark
Implementation of linear regression and logistic regression on SparkDalei Li
 
Throttling Malware Families in 2D
Throttling Malware Families in 2DThrottling Malware Families in 2D
Throttling Malware Families in 2DMohamed Nassar
 

Ähnlich wie Slope one recommender on hadoop (20)

Session-Based Recommendations with Recurrent Neural Networks (Balazs Hidasi, ...
Session-Based Recommendations with Recurrent Neural Networks(Balazs Hidasi, ...Session-Based Recommendations with Recurrent Neural Networks(Balazs Hidasi, ...
Session-Based Recommendations with Recurrent Neural Networks (Balazs Hidasi, ...
 
Matrix Factorization
Matrix FactorizationMatrix Factorization
Matrix Factorization
 
OpenAI Retro Contest
OpenAI Retro ContestOpenAI Retro Contest
OpenAI Retro Contest
 
BIRTE-13-Kawashima
BIRTE-13-KawashimaBIRTE-13-Kawashima
BIRTE-13-Kawashima
 
Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013
 
AI Lesson 39
AI Lesson 39AI Lesson 39
AI Lesson 39
 
Lesson 39
Lesson 39Lesson 39
Lesson 39
 
Productive Use of the Apache Spark Prompt with Sam Penrose
Productive Use of the Apache Spark Prompt with Sam PenroseProductive Use of the Apache Spark Prompt with Sam Penrose
Productive Use of the Apache Spark Prompt with Sam Penrose
 
Benchmarking Perl (Chicago UniForum 2006)
Benchmarking Perl (Chicago UniForum 2006)Benchmarking Perl (Chicago UniForum 2006)
Benchmarking Perl (Chicago UniForum 2006)
 
Deliberately Planning and Acting for Angry Birds with Refinement Methods
Deliberately Planning and Acting for Angry Birds with Refinement MethodsDeliberately Planning and Acting for Angry Birds with Refinement Methods
Deliberately Planning and Acting for Angry Birds with Refinement Methods
 
Data Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupData Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup Group
 
Cloudera Data Science Challenge
Cloudera Data Science ChallengeCloudera Data Science Challenge
Cloudera Data Science Challenge
 
Reward constrained interactive recommendation with natural language feedback ...
Reward constrained interactive recommendation with natural language feedback ...Reward constrained interactive recommendation with natural language feedback ...
Reward constrained interactive recommendation with natural language feedback ...
 
[SOCRS2013]Differential Context Modeling in Collaborative Filtering
[SOCRS2013]Differential Context Modeling in Collaborative Filtering[SOCRS2013]Differential Context Modeling in Collaborative Filtering
[SOCRS2013]Differential Context Modeling in Collaborative Filtering
 
Face recognition v1
Face recognition v1Face recognition v1
Face recognition v1
 
Using CNTK's Python Interface for Deep LearningDave DeBarr -
Using CNTK's Python Interface for Deep LearningDave DeBarr - Using CNTK's Python Interface for Deep LearningDave DeBarr -
Using CNTK's Python Interface for Deep LearningDave DeBarr -
 
MapReduce: teoria e prática
MapReduce: teoria e práticaMapReduce: teoria e prática
MapReduce: teoria e prática
 
Benchmarking Perl Lightning Talk (NPW 2007)
Benchmarking Perl Lightning Talk (NPW 2007)Benchmarking Perl Lightning Talk (NPW 2007)
Benchmarking Perl Lightning Talk (NPW 2007)
 
Implementation of linear regression and logistic regression on Spark
Implementation of linear regression and logistic regression on SparkImplementation of linear regression and logistic regression on Spark
Implementation of linear regression and logistic regression on Spark
 
Throttling Malware Families in 2D
Throttling Malware Families in 2DThrottling Malware Families in 2D
Throttling Malware Families in 2D
 

Mehr von YONG ZHENG

[ADMA 2017] Identification of Grey Sheep Users By Histogram Intersection In R...
[ADMA 2017] Identification of Grey Sheep Users By Histogram Intersection In R...[ADMA 2017] Identification of Grey Sheep Users By Histogram Intersection In R...
[ADMA 2017] Identification of Grey Sheep Users By Histogram Intersection In R...YONG ZHENG
 
[RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similari...
[RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similari...[RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similari...
[RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similari...YONG ZHENG
 
[WI 2017] Context Suggestion: Empirical Evaluations vs User Studies
[WI 2017] Context Suggestion: Empirical Evaluations vs User Studies[WI 2017] Context Suggestion: Empirical Evaluations vs User Studies
[WI 2017] Context Suggestion: Empirical Evaluations vs User StudiesYONG ZHENG
 
[WI 2017] Affective Prediction By Collaborative Chains In Movie Recommendation
[WI 2017] Affective Prediction By Collaborative Chains In Movie Recommendation[WI 2017] Affective Prediction By Collaborative Chains In Movie Recommendation
[WI 2017] Affective Prediction By Collaborative Chains In Movie RecommendationYONG ZHENG
 
Tutorial: Context-awareness In Information Retrieval and Recommender Systems
Tutorial: Context-awareness In Information Retrieval and Recommender SystemsTutorial: Context-awareness In Information Retrieval and Recommender Systems
Tutorial: Context-awareness In Information Retrieval and Recommender SystemsYONG ZHENG
 
[EMPIRE 2016] Adapt to Emotional Reactions In Context-aware Personalization
[EMPIRE 2016] Adapt to Emotional Reactions In Context-aware Personalization[EMPIRE 2016] Adapt to Emotional Reactions In Context-aware Personalization
[EMPIRE 2016] Adapt to Emotional Reactions In Context-aware PersonalizationYONG ZHENG
 
[UMAP 2016] User-Oriented Context Suggestion
[UMAP 2016] User-Oriented Context Suggestion[UMAP 2016] User-Oriented Context Suggestion
[UMAP 2016] User-Oriented Context SuggestionYONG ZHENG
 
Tutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsTutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsYONG ZHENG
 
Context-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick ViewContext-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick ViewYONG ZHENG
 
[WISE 2015] Similarity-Based Context-aware Recommendation
[WISE 2015] Similarity-Based Context-aware Recommendation[WISE 2015] Similarity-Based Context-aware Recommendation
[WISE 2015] Similarity-Based Context-aware RecommendationYONG ZHENG
 
[UMAP 2015] Integrating Context Similarity with Sparse Linear Recommendation ...
[UMAP 2015] Integrating Context Similarity with Sparse Linear Recommendation ...[UMAP 2015] Integrating Context Similarity with Sparse Linear Recommendation ...
[UMAP 2015] Integrating Context Similarity with Sparse Linear Recommendation ...YONG ZHENG
 
[SAC 2015] Improve General Contextual SLIM Recommendation Algorithms By Facto...
[SAC 2015] Improve General Contextual SLIM Recommendation Algorithms By Facto...[SAC 2015] Improve General Contextual SLIM Recommendation Algorithms By Facto...
[SAC 2015] Improve General Contextual SLIM Recommendation Algorithms By Facto...YONG ZHENG
 
[IUI2015] A Revisit to The Identification of Contexts in Recommender Systems
[IUI2015] A Revisit to The Identification of Contexts in Recommender Systems[IUI2015] A Revisit to The Identification of Contexts in Recommender Systems
[IUI2015] A Revisit to The Identification of Contexts in Recommender SystemsYONG ZHENG
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsYONG ZHENG
 
[RecSys 2014] Deviation-Based and Similarity-Based Contextual SLIM Recommenda...
[RecSys 2014] Deviation-Based and Similarity-Based Contextual SLIM Recommenda...[RecSys 2014] Deviation-Based and Similarity-Based Contextual SLIM Recommenda...
[RecSys 2014] Deviation-Based and Similarity-Based Contextual SLIM Recommenda...YONG ZHENG
 
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
[CIKM 2014] Deviation-Based Contextual SLIM RecommendersYONG ZHENG
 
[WI 2014]Context Recommendation Using Multi-label Classification
[WI 2014]Context Recommendation Using Multi-label Classification[WI 2014]Context Recommendation Using Multi-label Classification
[WI 2014]Context Recommendation Using Multi-label ClassificationYONG ZHENG
 
[SAC2014]Splitting Approaches for Context-Aware Recommendation: An Empirical ...
[SAC2014]Splitting Approaches for Context-Aware Recommendation: An Empirical ...[SAC2014]Splitting Approaches for Context-Aware Recommendation: An Empirical ...
[SAC2014]Splitting Approaches for Context-Aware Recommendation: An Empirical ...YONG ZHENG
 
[Decisions2013@RecSys]The Role of Emotions in Context-aware Recommendation
[Decisions2013@RecSys]The Role of Emotions in Context-aware Recommendation[Decisions2013@RecSys]The Role of Emotions in Context-aware Recommendation
[Decisions2013@RecSys]The Role of Emotions in Context-aware RecommendationYONG ZHENG
 
[UMAP2013]Tutorial on Context-Aware User Modeling for Recommendation by Bamsh...
[UMAP2013]Tutorial on Context-Aware User Modeling for Recommendation by Bamsh...[UMAP2013]Tutorial on Context-Aware User Modeling for Recommendation by Bamsh...
[UMAP2013]Tutorial on Context-Aware User Modeling for Recommendation by Bamsh...YONG ZHENG
 

Mehr von YONG ZHENG (20)

[ADMA 2017] Identification of Grey Sheep Users By Histogram Intersection In R...
[ADMA 2017] Identification of Grey Sheep Users By Histogram Intersection In R...[ADMA 2017] Identification of Grey Sheep Users By Histogram Intersection In R...
[ADMA 2017] Identification of Grey Sheep Users By Histogram Intersection In R...
 
[RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similari...
[RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similari...[RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similari...
[RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similari...
 
[WI 2017] Context Suggestion: Empirical Evaluations vs User Studies
[WI 2017] Context Suggestion: Empirical Evaluations vs User Studies[WI 2017] Context Suggestion: Empirical Evaluations vs User Studies
[WI 2017] Context Suggestion: Empirical Evaluations vs User Studies
 
[WI 2017] Affective Prediction By Collaborative Chains In Movie Recommendation
[WI 2017] Affective Prediction By Collaborative Chains In Movie Recommendation[WI 2017] Affective Prediction By Collaborative Chains In Movie Recommendation
[WI 2017] Affective Prediction By Collaborative Chains In Movie Recommendation
 
Tutorial: Context-awareness In Information Retrieval and Recommender Systems
Tutorial: Context-awareness In Information Retrieval and Recommender SystemsTutorial: Context-awareness In Information Retrieval and Recommender Systems
Tutorial: Context-awareness In Information Retrieval and Recommender Systems
 
[EMPIRE 2016] Adapt to Emotional Reactions In Context-aware Personalization
[EMPIRE 2016] Adapt to Emotional Reactions In Context-aware Personalization[EMPIRE 2016] Adapt to Emotional Reactions In Context-aware Personalization
[EMPIRE 2016] Adapt to Emotional Reactions In Context-aware Personalization
 
[UMAP 2016] User-Oriented Context Suggestion
[UMAP 2016] User-Oriented Context Suggestion[UMAP 2016] User-Oriented Context Suggestion
[UMAP 2016] User-Oriented Context Suggestion
 
Tutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsTutorial: Context In Recommender Systems
Tutorial: Context In Recommender Systems
 
Context-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick ViewContext-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick View
 
[WISE 2015] Similarity-Based Context-aware Recommendation
[WISE 2015] Similarity-Based Context-aware Recommendation[WISE 2015] Similarity-Based Context-aware Recommendation
[WISE 2015] Similarity-Based Context-aware Recommendation
 
[UMAP 2015] Integrating Context Similarity with Sparse Linear Recommendation ...
[UMAP 2015] Integrating Context Similarity with Sparse Linear Recommendation ...[UMAP 2015] Integrating Context Similarity with Sparse Linear Recommendation ...
[UMAP 2015] Integrating Context Similarity with Sparse Linear Recommendation ...
 
[SAC 2015] Improve General Contextual SLIM Recommendation Algorithms By Facto...
[SAC 2015] Improve General Contextual SLIM Recommendation Algorithms By Facto...[SAC 2015] Improve General Contextual SLIM Recommendation Algorithms By Facto...
[SAC 2015] Improve General Contextual SLIM Recommendation Algorithms By Facto...
 
[IUI2015] A Revisit to The Identification of Contexts in Recommender Systems
[IUI2015] A Revisit to The Identification of Contexts in Recommender Systems[IUI2015] A Revisit to The Identification of Contexts in Recommender Systems
[IUI2015] A Revisit to The Identification of Contexts in Recommender Systems
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender Systems
 
[RecSys 2014] Deviation-Based and Similarity-Based Contextual SLIM Recommenda...
[RecSys 2014] Deviation-Based and Similarity-Based Contextual SLIM Recommenda...[RecSys 2014] Deviation-Based and Similarity-Based Contextual SLIM Recommenda...
[RecSys 2014] Deviation-Based and Similarity-Based Contextual SLIM Recommenda...
 
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
 
[WI 2014]Context Recommendation Using Multi-label Classification
[WI 2014]Context Recommendation Using Multi-label Classification[WI 2014]Context Recommendation Using Multi-label Classification
[WI 2014]Context Recommendation Using Multi-label Classification
 
[SAC2014]Splitting Approaches for Context-Aware Recommendation: An Empirical ...
[SAC2014]Splitting Approaches for Context-Aware Recommendation: An Empirical ...[SAC2014]Splitting Approaches for Context-Aware Recommendation: An Empirical ...
[SAC2014]Splitting Approaches for Context-Aware Recommendation: An Empirical ...
 
[Decisions2013@RecSys]The Role of Emotions in Context-aware Recommendation
[Decisions2013@RecSys]The Role of Emotions in Context-aware Recommendation[Decisions2013@RecSys]The Role of Emotions in Context-aware Recommendation
[Decisions2013@RecSys]The Role of Emotions in Context-aware Recommendation
 
[UMAP2013]Tutorial on Context-Aware User Modeling for Recommendation by Bamsh...
[UMAP2013]Tutorial on Context-Aware User Modeling for Recommendation by Bamsh...[UMAP2013]Tutorial on Context-Aware User Modeling for Recommendation by Bamsh...
[UMAP2013]Tutorial on Context-Aware User Modeling for Recommendation by Bamsh...
 

Kürzlich hochgeladen

Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 

Kürzlich hochgeladen (20)

Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 

Slope one recommender on hadoop

  • 1. Slope One Recommender on Hadoop YONG ZHENG Center for Web Intelligence DePaul University Nov 15, 2012
  • 2. Overview • Introduction • Recommender Systems & Slope One Recommender • Distributed Slope One on Mahout and Hadoop • Experimental Setup and Analyses • Drive Mahout on Hadoop • Interesting Communities Center for Web Intelligence, DePaul University, USA
  • 3. Introduction • About Me: a recommendation guy • My Research: data mining and recommender systems • Typical Experimental Research 1) Design or improve an algorithm; 2) Run algorithms and baseline algs on datasets; 3) Compare experimental results; 4) Try different parameters, find reasons and even re-design and improve algorithm itself; 5) Run algorithms and baseline algs on datasets; 6) Compare experimental results; 7) Try different parameters, find reasons and even re-design and improve algorithm itself; 8) And so on… Until it approaches expected results.
  • 4. Introduction • Sometimes, data is large-scale. e.g. one algorithm may spend days to complete, how about experimental results are not as expected. Then improve algorithms and run it for days again, and again. How can we do previously? (for tasks not that complicated) 1). Paralleling but complicated synchronization and limited resources, such as CPU, memory, etc; 2). Take advantage of PC Labs, let’s do it with 10 PCs • Nearly all research will ultimately face the large-scale problems , especially in the domain of data mining. • But, we have Map-Reduce NOW!
  • 5. Introduction • Do not need to distribute data and tasks manually. Instead we just simply generate configurations. • Do not need to care about more details, e.g. how data is distributed, when one specific task will be ran on which machine, or how they conduct tasks one by one. • Instead, we can pre-define working flow. We can take advantage of the functional contributions from mappers and reducers. • More benefits: replication, balancing, robustness, etc
  • 6. Recommender Systems • Collaborative Filtering • Slope One and Simple Weighted Slope One • Slope One in Mahout • Distributed Slope One in Mahout • Mappers and Reducers Center for Web Intelligence, DePaul University, USA
  • 8. Collaborative Filtering (CF) One of most popular recommendation algorithms.  User-based: User-CF  Item-based: Item-CF, Slope One User 5 Rating? 5 4 4 4 star 5 Example: User-based Collaborative Filtering
  • 9. Slope One Recommender Reference: Daniel Lemire, Anna Maclachlan, Slope One Predictors for Online Rating-Based Collaborative Filtering, In SIAM Data Mining (SDM'05), April 21-23, 2005. http://lemire.me/fr/abstracts/SDM2005.html User Batman Spiderman U1 3 4 U2 2 4 U3 2 ? 1). How different two movies were rated? U1 rated Spiderman higher by (4-3) = 1 U2 rated Spiderman higher by (4-2) = 2 On average, Spiderman is rated (1+2)/2 = 1.5 higher 2). Rating difference can tell predictions If we know U3 gave Batman a 2-star, probably he will rated Spiderman by (2+1.5) = 3.5 star
  • 10. Simple Weighted Slope One Usually user rated multiple items User HarryPotter Batman Spiderman U1 5 3 4 U2 ? 2 4 U3 4 2 ? 1). How different the two movies were rated? Diff(Batman, Spiderman) = [(4-3)+(4-2)]/2 = 1.5 Diff(HarryPotter, Spiderman) = (4-5)/1 = -1 “2” and “1” here we call them as “count”. 2). Weighted rating difference can tell predictions We use a simple weighted approach Refer to Batman only, rating = 2+1.5 = 3.5 Refer to HarryPotter only, rating = 4-1 = 3 Consider them all, predicted rating = (3.5*2 + 3*1])/ (2+1) = 3.33
  • 11. Simple Weighted Slope One User HarryPotter Batman Spiderman u1 5 3 4 u2 ? 2 4 u3 4 2 ? Question: Online or Offline? To calculate the prediction ratings, we need 2 matrices: 1).Difference Matrix Movie1 Movie2 Movie3 Movie4 Movie1 Movie2 -1.5 Movie3 2 1 Movie4 -1 0.5 -2 2). Count Matrix Just number of users co-rated on two items
  • 12. Slope One in Mahout Mahout, an open-source machine learning library. 1). Recommendation algorithms User-based CF, Item-based CF, Slope One, etc 2). Clustering KMeans, Fuzzy KMeans, etc 3). Classification Decision Trees, Naive Bayes, SVM, etc 4). Latent Factor Models LDA, SVD, Matrix Factorization, etc
  • 13. Slope One in Mahout org.apache.mahout.cf.taste.impl.recommender.slopeone.SlopeOneRecommender Pre-Processing Stage: (class MemoryDiffStorage with Map) for every item i for every other item j for every user u expressing preference for both i and j add the difference in u’s preference for i and j to an average Recommendation Stage: for every item i the user u expresses no preference for for every item j that user u expresses a preference for find the average preference difference between j and i add this diff to u’s preference value for j add this to a running average return the top items, ranked by these averages Simple weighting: as introduced previously StdDev weighting: item-item rating diffs with lower sd should be weighted highly
  • 14. Distributed Slope One in Mahout Similar to our previous practice, e.g. the matrix factorization Process, what we need is the Difference Matrix. Suppose there are M users rated N items, the matrix requires N(N-1)/2 cells. Also, the density is another aspect – how user rated items. If there are several items and the rating matrix is dense, the computational costs will increase accordingly. Question again: Online or Offline? Depends on tasks & data. Large-scale data. Let’s do it offline!
  • 15. Distributed Slope One in Mahout package org.apache.mahout.cf.taste.hadoop.slopeone; class SlopeOneAverageDiffsJob class SlopeOnePrefsToDiffsReducer class SlopeOneDiffsToAveragesReducer package org.apache.mahout.cf.taste.hadoop; class ToItemPrefsMapper org.apache.hadoop.mapreduce.Mapper Two Mapper-Reducer Stages: 1). Create DiffMatrix for each user 2). Collect AvgDiff info, counts, StdDev Let’s see how it works…
  • 16. Mapper and Reducer - 1 User HarryPotter Batman Spiderman U1 5 3 4 U2 ? 2 4 U3 4 2 ? Mapper1 (ToItemPrefsMapper)  <UserID, Pair<ItemID, Rating>> Reducer1 (PrefsToDiffsReducer)  <Pair<Item1,Item2>, Diff> (for all three users) <U1> Potter Bat Spider <U2> Potter Bat Spider Potter Potter Bat -2 Bat NULL Spider -1 1 Spider NULL 2
  • 17. Mapper and Reducer - 2 <U1> Potter Bat Spider <U2> Potter Bat Spider Potter Potter Bat -2 Bat NULL Spider -1 1 Spider NULL 2 Mapper2 (org.apache.hadoop.mapreduce.Mapper) Reducer2 (DiffsToAveragesReducer) Average Diffs, Count, StedDev <Aggregate> Potter Bat Spider Potter Bat -2, 1 Spider -1, 1 1.5, 2 Simply, <a,b> pair denotes a=averge diff, b=count Notice: we should use three matrices in practice, here I used 2.
  • 18. Predictions User HarryPotter Batman Spiderman U1 5 3 4 U2 ? 2 4 U3 4 2 ? <Aggregate> Potter Bat Spider Potter Bat -2, 1 Spider -1, 1 1.5, 2 Simply, <a,b> pair denotes a=averge diff, b=count Notice: we should use three matrices in practice, here I used 2. Prediction(U3, Spiderman) = [(4-1)*1 + (2+1.5)*2] / (1+2) = 3.33333333333333333333
  • 19. Experiments • Data • Hadoop Setup • Running Performances Center for Web Intelligence, DePaul University, USA
  • 20. Experiment Setup Data: MovieLens-1M ratings # of users: 6,040 # of movies: 3,900 # of ratings: 1,000,209 Density of the ratings: each user has at least 20 ratings obviously, some users have many more ratings Rating format: UserID, ItemID, Rating (scale 1-5) Data Split: 80% training, 20% testing
  • 21. Experiment Setup Hadoop Cluster Setup  IBM SmartCloud  1 master node, 7 slave nodes  Each node is as SUSE Linux Enterprise Server v11 SP1  Server Configuration: 64 bit (vCPU: 2, RAM: 4 GiB, Disk: 60 GiB)  Hadoop v.0.20.205.0  Mahout distribution-0.6 The environment setup follows the typical workflow as: http://irecsys.blogspot.com/2012/11/configurate-map-reduce- environment-on.html Thanks Scott Young, neat writeup!!
  • 22. Experimental Analyses Stage-1: SlopeOneAverageDiffsJob by Map-Reduce Goal: Build DiffStorage Output: DiffStorage txt file, 1.45GB Running Time:  real 13m 34.228s  user 0m 5.136s  sys 0m 1.028s Item1 Item2 Diff Count StdDev 221 223 -1.02 197 0.5 Stage-2: Java evaluator to measure MAE on testing set Running Time:  Load Testing Set (21K records), 299ms  Load Training Set (79K records), 1,771ms  Load DiffStorage, 176,352ms = 2.9m  Prediction (21K records), 18,182ms = 0.3m  MAE = 0.71330756
  • 23. Experimental Experiences 1. Why not MovieLens 10M data? Map-Reduce on 10M data may cost several hrs; Running time depends on cluster and configuration; Also, DiffStorage file will be too large. 2. Java Evaluator Load full DiffStorage file is time-consuming. Also, incur Java heap space and GCOverlimit errors; Those errors can not be fixed by –Xmx or other solutions; Two solutions: 1). Just use simple weighting, discard StdDev weighting. 2). Simple Mapper and Reducer, run it on clusters. For MovieLens 1M, it is not that efficient compared with the live SlopeOne recommendation; 10M data may be better, will try MovieLens-10M data later; Slope One is simple but memory-expensive.
  • 24. More … • Drive Mahout on Hadoop • Interesting Communities Center for Web Intelligence, DePaul University, USA
  • 25. Mahout + Hadoop How to put more Mahout algorithms to Hadoop? 1. Pre-set Command in Mahout Let’s see bin/mahout – help, then it provides a list of available programs such as svd, fkmeans, etc. Some are basic functions, such as splitDataset Some can be executed as Hadoop tasks e.g. Run and evaluate Matrix Factorization on rating dataset bin/mahout parallelALS --input inputSource --output outputSource --tempDir tmpFolder --numFeatures 20 --numIterations 10 bin/mahout evaluateFactorization --input inputSource --output outputSource --userFeatures als/out/U/ --itemFeatures als/out/M/ --tempDir tmpFolder
  • 26. Mahout + Hadoop 2. More Algorithms on Hadoop Mahout provides a way to run more Mahout algorithms. Simply, $HADOOP_HOME/bin/hadoop jar $MAHOUT_HOME/core/target/mahout-core- <version>.jar <Job Class> --recommenderClassName Class <OPTIONS> Which kinds of Jobs it supports? Mahout implemented some versions. Some popular ones: 1).org.apache.mahout.cf.taste.hadoop.pseudo.RecommenderJob --recommenderClassName ClassName 2).org.apache.mahout.cf.taste.hadoop.item.RecommenderJob 3).org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJob 4).org.apache.mahout.cf.taste.hadoop.slopeone.SlopeOneAverageDiffsJob
  • 27. Interesting Communities Beyond Hadoop and Mahout official sites 1. Data Mining KDnuggets, http://www.kdnuggets.com Popular community for Data Mining & Analytics. Lots of useful information, such as news, materials, datasets, jobs, etc. 2. Big Data SmartData Collective, http://smartdatacollective.com/ Smarter Computing, http://www.smartercomputingblog.com/ Big Data Meetup, http://big-data.meetup.com/ 3. Recommender Systems ACM Official Site, http://recsys.acm.org/ RecSys Wiki, http://recsyswiki.com/
  • 28. Thank You! Center for Web Intelligence, DePaul University, USA