Personalized recommendations drive business, helping people find the products they want, the news they need, and the music they didn't know they would love. Despite the obvious advantages, many companies either don't have recommendations or don't leverage their data to make good ones. Too many recommendation engines are black-box algorithms that are hard to change or don't scale well. Using the same recommendation techniques as used at StubHub, Viacom, and AP, this technical webinar will show you how to load your data from MongoDB into Hadoop, generate recommendations, and then put those recommendations into MongoDB, ready to serve end-users. This webinar will prepare you to build a custom recommender for your company that is highly scalable, easy to understand, and built on open-source technology.
K Young: About the speaker
K Young is the CEO of Mortar Data. Mortar serves data scientists and engineers with a service that makes creating and operating high-scale data pipelines easy. Mortar contributes to several open source projects including Pig, Luigi, and the Mongo-Hadoop connector. Prior to founding Mortar Data, K built software that reaches one in ten public school students in the U.S. He holds a Computer Science degree from Rice University.
3. EXAMPLES
Recommendation Engine
LinkedIn: 50% of new connections come
from "People You May Know"
Netflix: 75% of content is viewed because
of a recommendation
Amazon: 35% of sales are driven by
recommendations
5. FOR THIS WEBINAR
Agenda
1. Recommendation Engines
2. Hadoop
3. Demo: Build a Recommendation Engine
4. Your Recommendation Engine
5. Q&A
6. Recommendation Engine
NOW GENERALLY AVAILABLE
• Open source, free
• Very flexible
• Massively Scalable
• 100% Customizable
• Tested and proven
7. Recommendation Engine
Technical implementation of how humans
make recommendations.
Using:
• past behavior
• similar users
• content metadata
• outside signals e.g. instagram
HOW DO THEY WORK?
13. A WARNING
Recommendation Engine
Recommendation engines are famously
hard to launch because they touch:
engineering, finance, product, executive.
How to succeed:
1) speedy implementation (target 1 week)
2) engine flexibility
3) gradual roll-out
4) visible KPI-impact
14. RAPID OVERVIEW
Hadoop
Platform for distributed data processing.
Strengths:
• Can scale up to thousands of computers
• Widely used
• Very broadly applicable
• Free, open
Problem:
• Difficult to use for complex problems
26. Mortar
FAST INTRO
Data science lacks a way to
organize, test, deploy, and collaborate with code.
So:
• One-button code deployment, powered by Github
• Award-winning job monitoring and visualization
• Realtime log collection and error analysis
• Free local development with one-click installation
29. DEFINITIONS
Recommendation Engine
Users: Someone interacting with your
items and generating events that you
capture
Items: The things you are recommending:
videos, articles, products, etc.
Signal: A user-item interaction with a
weighting that tells us the relative value of
the interaction.
31. STEPS
Recommendation Engine
Steps in a recommendation engine:
• Load your data
• Generate your signals
• Call code to generate recommendations
• Store your recommendations
Not covered today:
• Serve your recommendations
• Track KPI-impact
32. DEMO
Recommendation Engine
17.5MM documents of 360K users’ top
played artists. Provided by Last.fm at
http://www.dtic.upf.edu/~ocelma/MusicRec
ommendationDataset/lastfm-360K.html
Used a Pig job to load a MongoLab
database with the data.
37. user_signals = foreach raw_input generate
user,
artist_name as item,
num_plays as weight:int;
Pig code
38. DEMO: CALL MORTAR
Recommendation Engine
Now that the data is in the correct format
we’ll call the mortar algorithms for
generating item-item and user-item
recommendations.
40. DEMO: STORE OUR RESULTS
Recommendation Engine
Now that we have our results let’s store
them back to MongoDB for use by our
application.
41. %default II_COLLECTION 'item_item_recs'
%default UI_COLLECTION 'user_item_recs'
store item_item_recs into
'$CONN/$DB.$II_COLLECTION' using
com.mongodb.hadoop.pig.MongoInsertStorage('','');
store user_item_recs into
'$CONN/$DB.$UI_COLLECTION' using
com.mongodb.hadoop.pig.MongoInsertStorage('','');
Pig code
42. DEMO: RUN IT!
Recommendation Engine
Now we’re going to use Mortar to start and
manage a Hadoop cluster to run our
recommender.
43. >mortar run pigscripts/mongo/lastfm-recsys-online.pig -f
params/lastfm.params --clustersize 10
Taking code snapshot... done
Sending code snapshot to Mortar... done
Requesting job execution... done
job_id: 534462bea22f3803fd9cacca
Job status can be viewed on the web at:
https://app.mortardata.com/jobs/job_detail?job_id=534462bea22f3803
fd9cacca
44.
45. >db.item_item_recs.find()
{ "item_A":"yo-yo ma", "rank":1,
"item_B":"natalie clein" }
{ "item_A":"miley cyrus", "rank":1,
"item_B":"miley cyrus and billy ray cyrus” }
{ "item_A":"dimmu borgir", "rank":1,
"item_B":"ad inferna” }
46. EVALUATING YOUR RESULTS
Your Recommendation Engine
At first, use your knowledge of your domain
knowledge to determine whether
recommendations are sensible.
Mortar provides a recommendation
browser.
52. EVALUATING YOUR RESULTS
Your Recommendation Engine
Later, run A/B tests with your
recommendations to see how they improve
the metrics you care about.
Usually not multivariate.
Usually no training set is possible.
53. CUSTOMIZING
Your Recommendation Engine
To make customization easier Mortar has
help documentation and code covering
more than a dozen common cases:
• Removing bots from your signal data
• Removing out-of-stock items
• Boosting popular items
• Adding categories to your items
• Cold start
• Greater discovery and variety
54. PRODUCTION QUESTIONS
Your Recommendation Engine
How do you read your MongoDB?
1) Read backup files from S3
2) Connect to secondary nodes
3) Connect to primary nodes
4) Connect to dedicated analytics nodes
5) Turn file-system snapshot backups into
BSON
55. PRODUCTION QUESTIONS
Your Recommendation Engine
How do you release new recommendations
while serving the old ones?
API
Flip between live and offline database
Also enables rollback
56. WE DISCUSSED
Summary
What a recommendation engine is
How Hadoop works with MongoDB
Set up a demo recommendation engine
How to connect your data
Touched on advanced techniques
Steered away from pot holes
Resources for next step