Big Data Analytics and Content Recommendations on AWS

Big Data, Analytics, and
Content Recommendations on AWS
Mike Limcaco| Senior Manager – Solutions Architecture
Amazon Web Services

Search
Watch
Listen
Play
Download
Purchase
Rate It
Review It
Sharing
Tagging
Bookmarking

What are the common asks?
Content
•Top Content
•Engagement
• Plays per session
• Drop-off
•Referral path
•Recommendations
Audience
• Acquisitions
• Churn
• Where, when,
who
• Segmentation
• Cohorts
Operations
• How much
buffering
• Best CDN
paths
• Top devices
• Uniques per
platform
Other
•Monetization
•Ad Spend
•Social Media
• Mentions
• Sentiment Analysis
•A/B Feature Testing
•Talent Management

(Some) AWS Big Data Services
Amazon S3
Internet scale
storage
Amazon Elasticsearch
Hosted Elasticsearch
Distributed Search
Engine
Amazon EMR
Hosted Hadoop
compute framework
• Data transformation
• Aggregations
• Predictions
• Discovery
• Visualization
• Content Lake
• Raw signals
• Clickstream
• Profiles & models

Incrementally … magical*
1. Search with Boosting
2. Collaborative Filtering
3. Neural Networks
A
B
C
precision means greater engagement*

Search With BoostingA
1. Capture Audience Signal Data
2. Record aggregate “Popularity” in the search catalog
 Counts of Views | Downloads
 Trending Sentiment (Positive | Negative)
3. Query Search Engine
4. Use content metadata + “Popularity” to refine
search ranking
5. Enjoy recommendations!

vote = count(Event: title, device, time)
VOD Catalog
title: “Toy Dogs and Their Owners”
date: 2014
content: “Cute but surreal look at … ”
votes: 25
title: “The Usual Suspects”
date: 1995
content: “A sole survivor …”
votes: 85
title: “Star Wars 17”
date: 2026
content: “Yoda force ghost returns … surreal”
votes: 99

GET /catalog/movies/_search
Search engine adjusts
rankings based on additional contribution of dynamic field “votes”
VOD Catalog
title: “Toy Dogs and Their Owners”
date: 2014
content: “Cute but surreal …”
votes: 25
title: “The Usual Suspects”
date: 1995
content: “A sole survivor …”
votes: 85
title: “Star Wars 17”
date: 2026
content: “Yoda force ghost returns … surreal”
votes: 99
Rank #1

Collaborative Filtering
Create history of user and item preferences
2. Estimate similar users and items
3. Record these in Search Engine
4. Query Search Engine with User History
http://www.slideshare.net/tdunning/recommendation-techn
B

users
Media
platforms
Mobile
Search
Play
Buy
Rate
Recommendations

mike,view,movie-a
mike,view,movie-b
mike,view,movie-c
mike,buy,movie-b
chris,view,movie-b
chris,buy,movie-d
…
movie-b movie-c:2.772588722239781
movie-a:2.772588722239781
movie-d ….
Indicators
(“Items Similar To This….”)
% mahout spark-itemsimilarity
-i input-folder/data.txt
-o output-folder/
--filter1 buy -fc 1 -ic 2
--filter2 view

Step 1: Logs  History Matrix
User1 Thing1
User2 Thing2
User3 Thing3
User2 Thing4
User5 Thing1
User1 Thing2
User1 Thing3
Mike
Jon
Mary
Phil
Kris
Logs History Matrix

Step 2: Estimate Similar Things
History Matrix
2 8
2 4
8
4
Item-Item Matrix

Step 3: Reduce to Interesting Pairs
2 8
2 4
8
4
Item-Item Matrix
LLR
Indicators

Step 3: Reduce to Interesting Pairs
Indicators
Items Similar To This

Step 4: Store Indicators in a Search Engine (BATCH)
Superman Highlander,
Dune
Star Wars Raiders,
Minority
Report
Highlander Superman
Mulan Home Alone,
Mermaid
Star Trek …
… …
4587 223, 5234
748 5345, 235
12 8234
245 9543, 7673
3456 4587
… …
Index

Step 5: Query Search Engine w/ User History (REALTIME)
748 Star Wars 45, 235
12 Highlander 8234
245 Mulan 9543, 7673
4587 Superman 12, 5234
3456 Star Trek 2458 …
Query
“12”
5345
3456
12

Neural NetworksC
Create history of user and item preferences
2. Create a model (GPU) which captures the
relationships between users and items
3. Use the model to score (GPU) users and predict
their favorite items

Historical
User Events
(Watch,
Buy,
Subscribe)
Titles / Categories
Predictions
(Watch,
Buy,
Subscribe)

Historical
User Events
(Watch,
Buy,
Subscribe)
Titles / Categories
Predictions
(Watch,
Buy,
Subscribe)
Iterating and finding the right mix of weights (influence) that describes observed patterns in aggregate

Amazon DSSTNE
Deep Scalable Sparse Tensor Network
Engine
https://github.com/amznlabs/amazon-dsstne

Why DSSTNE
• Automated management of multi-GPU parallelism
• Scale out training
• Scale out predictions
• Optimised for Sparse Data Efficiency
• Lots of users and lots of products … but relatively small
overlap
https://github.com/amznlabs/amazon-dsstne

% train -c config.json -i gl_input.nc -n the-computed-model.nc …
(…. A little later …)
% predict … -k 10 -n the-computed-model.nc …
-r input-observations.txt
-s output-recommendations.txt

Sample Predictions (MovieLens 20M)
Training Time on g2 ~ 1:45s
Scoring 130K users ~ 20s
User 22 has great taste!

CG1 G2 P2
Processor X5570
(Nehalem)
E5-2670
(Sandy Bridge)
E5-2686 v4 (Broadwell)
GPU 2x Nvidia Tesla Fermi
M2050
4x Nvidia GRID 8 Nvidia K80 (2 GK210
GPU’s each)
CUDA Cores 896 6144 39,936 (23 TFlops)
GPU RAM (GB) 6 16 12
Cores 8 16 32
Clock Speed 2.9 – 3.33 2.6 – 3.33 2.7
RAM (GB) 22.5 60 732
RAM/pCore 2.81 3.75 22.8
Storage 1.68TB Magnetic 240GB SSD EBS Only
Networking 10Gbit 10Gbit 20Gbit
Cluster GPU

Phase 1:
Training on GPU
Phase 2 ...N:
Scoring (Predicting) on GPU

https://aws.amazon.com/blogs/big-
data/generating-recommendations-at-amazon-
scale-with-apache-spark-and-amazon-dsstne/

Big Data Analytics and Content Recommendations on AWS

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Big Data Analytics and Content Recommendations on AWS

Ähnlich wie Big Data Analytics and Content Recommendations on AWS (20)

Mehr von Amazon Web Services

Mehr von Amazon Web Services (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (14)

Big Data Analytics and Content Recommendations on AWS