Partner Webinar: Recommendation Engines with MongoDB and Hadoop

K Young - CEO, Mortar
Recommendation Engines
with MongoDB and Hadoop

Recommendation Engine
Recommendation engines automatically
recommend the "right" items for each user.
• Retail
• Music
• Videos
• Dating
• Etc…
WHAT IS IT

EXAMPLES
LinkedIn: 50% of new connections come
from "People You May Know"
Netflix: 75% of content is viewed because
of a recommendation
Amazon: 35% of sales are driven by
recommendations

FOR THIS WEBINAR
Agenda
1. Recommendation Engines
2. Hadoop
3. Demo: Build a Recommendation Engine
4. Your Recommendation Engine
5. Q&A

NOW GENERALLY AVAILABLE
• Open source, free
• Very flexible
• Massively Scalable
• 100% Customizable
• Tested and proven

Technical implementation of how humans
make recommendations.
Using:
• past behavior
• similar users
• content metadata
• outside signals e.g. instagram
HOW DO THEY WORK?

USER INTERACTIONS: SIGNALS

ITEM-ITEM RECOMMENDATIONS

USER-ITEM RECOMMENDATIONS

WHERE DO RECOMMENDATIONS APPEAR?
Landing page
Product page
Cart
Push email
Etc.

Predictions based on macro-trends, e.g.
trending on twitter
Numeric predictions, e.g. price elasticity
WHAT IS IT ISN’T

A WARNING
Recommendation engines are famously
hard to launch because they touch:
engineering, finance, product, executive.
How to succeed:
1) speedy implementation (target 1 week)
2) engine flexibility
3) gradual roll-out
4) visible KPI-impact

RAPID OVERVIEW
Hadoop
Platform for distributed data processing.
Strengths:
• Can scale up to thousands of computers
• Widely used
• Very broadly applicable
• Free, open
Problem:
• Difficult to use for complex problems

ON HADOOP
Pig
Less code
Compiles to native
Hadoop code
Popular
(LinkedIn, Twitter, Sal
esforce, Yahoo, Spoti
fy...)

BRIEF, EXPRESSIVE
LIKE PROCEDURAL SQL
Pig
(thanks: twitter hadoop world presentation)

FOR SERIOUS
The Same Script, In MapReduce

MOTIVATIONS
MongoDB + Pig
Data storage and data processing are often
separate concerns
Hadoop is built for scalable processing of large
datasets

SIMILAR PHILOSOPHY
MongoDB, Pig
Poly-structured data
• MongoDB: stores data, regardless of structure
• Pig: reads data, regardless of structure

SIMILAR PHILOSOPHY
MongoDB Hadoop Connector
Open source connector for Hadoop (and family)
to read from and write to MongoDB.
(Links at end).

Build a recommendation engine
ENOUGH PREAMBLE, NOW IT’S…
Demo Time!

DEMO AGENDA
1) Intro to Mortar
2) Download recommendation code
3) Hook up the demo implementation (last.fm)
4) Generate recommendations at scale
5) View recommendations

DEMO
Use Mortar for demo
Free to use
Open, code runs anywhere
Complete tutorial online (link at end)

Mortar
FAST INTRO
Data science lacks a way to
organize, test, deploy, and collaborate with code.
So:
• One-button code deployment, powered by Github
• Award-winning job monitoring and visualization
• Realtime log collection and error analysis
• Free local development with one-click installation

>mortar projects:fork
git@github.com:mortardata/mortar-recsys.git
mortar_webinar_20140415
Sending request to register project:
mortar_webinar_20140415... done
Status: Success!
Your project is ready for use. Type 'mortar help'
to see the commands you can perform on the
project.

DEFINITIONS
Users: Someone interacting with your
items and generating events that you
capture
Items: The things you are recommending:
videos, articles, products, etc.
Signal: A user-item interaction with a
weighting that tells us the relative value of
the interaction.

STEPS
Steps in a recommendation engine:
• Load your data
• Generate your signals
• Call code to generate recommendations
• Store your recommendations
Not covered today:
• Serve your recommendations
• Track KPI-impact

DEMO
17.5MM documents of 360K users’ top
played artists. Provided by Last.fm at
http://www.dtic.upf.edu/~ocelma/MusicRec
ommendationDataset/lastfm-360K.html
Used a Pig job to load a MongoLab
database with the data.

>db.lastfm_plays.find()
{ "user" : "faf…a60", "num_plays" : 67,
"artist_name" : "beastie boys" }
{ "user" : "faf0…a60", "num_plays" : 66,
"artist_name" : "the beatles" }
{ "user" : "faf0…a60", "num_plays" : 65,
"artist_name" : "the smashing pumpkins" }

DEMO: LOAD THE DATA
First step: Load our listening data.

%default DB 'mongo_webinar'
%default PLAYS_COLLECTION ‘lastfm_plays'
raw_input =
load '$CONN/$DB.$PLAYS_COLLECTION'
using com.mongodb.hadoop.pig.MongoLoader('
user:chararray,
artist_name:chararray,
num_plays:int
');
Pig code

DEMO: GENERATE SIGNALS
Now that we have our data loaded we need
to extract: user, item, signal.

user_signals = foreach raw_input generate
user,
artist_name as item,
num_plays as weight:int;
Pig code

DEMO: CALL MORTAR
Now that the data is in the correct format
we’ll call the mortar algorithms for
generating item-item and user-item
recommendations.

item_item_recs = recsys__GetItemItemRecommendations(user_signals);
user_item_recs =
recsys__GetUserItemRecommendations(user_signals, item_item_recs);
Pig code

DEMO: STORE OUR RESULTS
Now that we have our results let’s store
them back to MongoDB for use by our
application.

%default II_COLLECTION 'item_item_recs'
%default UI_COLLECTION 'user_item_recs'
store item_item_recs into
'$CONN/$DB.$II_COLLECTION' using
com.mongodb.hadoop.pig.MongoInsertStorage('','');
store user_item_recs into
'$CONN/$DB.$UI_COLLECTION' using
com.mongodb.hadoop.pig.MongoInsertStorage('','');
Pig code

DEMO: RUN IT!
Now we’re going to use Mortar to start and
manage a Hadoop cluster to run our
recommender.

>mortar run pigscripts/mongo/lastfm-recsys-online.pig -f
params/lastfm.params --clustersize 10
Taking code snapshot... done
Sending code snapshot to Mortar... done
Requesting job execution... done
job_id: 534462bea22f3803fd9cacca
Job status can be viewed on the web at:
https://app.mortardata.com/jobs/job_detail?job_id=534462bea22f3803
fd9cacca

>db.item_item_recs.find()
{ "item_A":"yo-yo ma", "rank":1,
"item_B":"natalie clein" }
{ "item_A":"miley cyrus", "rank":1,
"item_B":"miley cyrus and billy ray cyrus” }
{ "item_A":"dimmu borgir", "rank":1,
"item_B":"ad inferna” }

EVALUATING YOUR RESULTS
Your Recommendation Engine
At first, use your knowledge of your domain
knowledge to determine whether
recommendations are sensible.
Mortar provides a recommendation
browser.

Optionally get detailed recommendations.

item_item_recs =
recsys__GetItemItemRecommendationsDetailed(user_signals);
Pig code

Later, run A/B tests with your
recommendations to see how they improve
the metrics you care about.
Usually not multivariate.
Usually no training set is possible.

CUSTOMIZING
To make customization easier Mortar has
help documentation and code covering
more than a dozen common cases:
• Removing bots from your signal data
• Removing out-of-stock items
• Boosting popular items
• Adding categories to your items
• Cold start
• Greater discovery and variety

PRODUCTION QUESTIONS
How do you read your MongoDB?
1) Read backup files from S3
2) Connect to secondary nodes
3) Connect to primary nodes
4) Connect to dedicated analytics nodes
5) Turn file-system snapshot backups into
BSON

PRODUCTION QUESTIONS
How do you release new recommendations
while serving the old ones?
API
Flip between live and offline database
Also enables rollback

WE DISCUSSED
Summary
What a recommendation engine is
How Hadoop works with MongoDB
Set up a demo recommendation engine
How to connect your data
Touched on advanced techniques
Steered away from pot holes
Resources for next step

help.mortardata.com/recommenders
answers.mortardata.com
@kky
@mortardata

Partner Webinar: Recommendation Engines with MongoDB and Hadoop

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (7)

Andere mochten auch

Andere mochten auch (15)

Ähnlich wie Partner Webinar: Recommendation Engines with MongoDB and Hadoop

Ähnlich wie Partner Webinar: Recommendation Engines with MongoDB and Hadoop (20)

Mehr von MongoDB

Mehr von MongoDB (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Partner Webinar: Recommendation Engines with MongoDB and Hadoop