This document discusses using big data and machine learning techniques on AWS for content recommendations. It describes three common approaches: search with boosting which adjusts search rankings based on popularity signals; collaborative filtering which identifies similar users and items; and neural networks which use historical user events to create a model that predicts favorites. It also introduces Amazon DSSTNE (Deep Scalable Sparse Tensor Network Engine) for automating GPU-accelerated training and prediction at scale for recommendation systems.
5. What are the common asks?
Content
•Top Content
•Engagement
• Plays per session
• Drop-off
•Referral path
•Recommendations
Audience
• Acquisitions
• Churn
• Where, when,
who
• Segmentation
• Cohorts
Operations
• How much
buffering
• Best CDN
paths
• Top devices
• Uniques per
platform
Other
•Monetization
•Ad Spend
•Social Media
• Mentions
• Sentiment Analysis
•A/B Feature Testing
•Talent Management
6. (Some) AWS Big Data Services
Amazon S3
Internet scale
storage
Amazon Elasticsearch
Hosted Elasticsearch
Distributed Search
Engine
Amazon EMR
Hosted Hadoop
compute framework
• Data transformation
• Aggregations
• Predictions
• Discovery
• Visualization
• Content Lake
• Raw signals
• Clickstream
• Profiles & models
13. vote = count(Event: title, device, time)
VOD Catalog
title: “Toy Dogs and Their Owners”
date: 2014
content: “Cute but surreal look at … ”
votes: 25
title: “The Usual Suspects”
date: 1995
content: “A sole survivor …”
votes: 85
title: “Star Wars 17”
date: 2026
content: “Yoda force ghost returns … surreal”
votes: 99
14. GET /catalog/movies/_search
Search engine adjusts
rankings based on additional contribution of dynamic field “votes”
VOD Catalog
title: “Toy Dogs and Their Owners”
date: 2014
content: “Cute but surreal …”
votes: 25
title: “The Usual Suspects”
date: 1995
content: “A sole survivor …”
votes: 85
title: “Star Wars 17”
date: 2026
content: “Yoda force ghost returns … surreal”
votes: 99
Rank #1
15. Collaborative Filtering
1. Capture Audience Signal Data
Create history of user and item preferences
2. Estimate similar users and items
3. Record these in Search Engine
4. Query Search Engine with User History
5. Enjoy recommendations!
http://www.slideshare.net/tdunning/recommendation-techn
B
18. Step 1: Logs History Matrix
User1 Thing1
User2 Thing2
User3 Thing3
User2 Thing4
User5 Thing1
User1 Thing2
User1 Thing3
Mike
Jon
Mary
Phil
Kris
Logs History Matrix
19. Step 2: Estimate Similar Things
History Matrix
2 8
2 4
8
4
Item-Item Matrix
20. Step 3: Reduce to Interesting Pairs
2 8
2 4
8
4
Item-Item Matrix
LLR
Indicators
(“Items Similar To This….”)
21. Step 3: Reduce to Interesting Pairs
Indicators
(“Items Similar To This….”)
Items Similar To This
22. Step 4: Store Indicators in a Search Engine (BATCH)
Superman Highlander,
Dune
Star Wars Raiders,
Minority
Report
Highlander Superman
Mulan Home Alone,
Mermaid
Star Trek …
… …
4587 223, 5234
748 5345, 235
12 8234
245 9543, 7673
3456 4587
… …
Index
24. Step 5: Query Search Engine w/ User History (REALTIME)
748 Star Wars 45, 235
12 Highlander 8234
245 Mulan 9543, 7673
4587 Superman 12, 5234
3456 Star Trek 2458 …
Query
“12”
5345
3456
12
25. Neural NetworksC
1. Capture Audience Signal Data
Create history of user and item preferences
2. Create a model (GPU) which captures the
relationships between users and items
3. Use the model to score (GPU) users and predict
their favorite items
4. Enjoy recommendations!
31. Why DSSTNE
• Automated management of multi-GPU parallelism
• Scale out training
• Scale out predictions
• Optimised for Sparse Data Efficiency
• Lots of users and lots of products … but relatively small
overlap
https://github.com/amznlabs/amazon-dsstne
32. % train -c config.json -i gl_input.nc -n the-computed-model.nc …
(…. A little later …)
% predict … -k 10 -n the-computed-model.nc …
-r input-observations.txt
-s output-recommendations.txt