Pinterest - Big Data Machine Learning Platform at Pinterest

Big Data ML Platform at Pinterest
Yongsheng Wu
Pinterest: pinterest.com/yswu
LinkedIn: linkedin.com/in/yongshengwu
Twitter: @yswu
06/17/2019

Pinterest :
The World’s Catalog of Ideas

Mission
Help people discover and do
what they love.

Scale@Pinterest
Service Scale
• 300M+ MAUs
• 120B+ Pins
• 3B+ Boards
Big Data Scale
• 300+ PB on S3
• 6000+ Hive/Hadoop nodes
• 400+ Presto nodes
• 1000+ Spark nodes

Mission & Vision
Principles
Current Status
Key Technologies
Future Plan

Mission
Provide a highly scalable, reliable, secure, performant, efficient and
delightful-to-use big data and machine learning platform to enable rapid
product innovation and help make Pinterest a thriving business.
Vision
A big data and machine learning platform at scale enables every single
engineer at Pinterest to derive trustworthy, actionable insights and
apply ML to solve complex problems with ease and confidence.

Principles
● Put engineers first - make the platform delightful-to-use for all
engineers at Pinterest
● Keep it simple, get it right - build a simple yet sufficient
platform
● Enable speed and quality - enable all engineers at Pinterest to
move fast with scalable, reliable, secure, performant and efficient
solutions made easy by the platform
● Build with reusability and for reusability - embrace open
source technology, build with lego blocks and provide lego blocks to
all engineers at Pinterest

9
Mission & Vision
Principles
Current Status
Key Technologies
Future Plan

Big Data Platform
Big Data PlatformBig Data Platform
Feature Platform
ML Platform

Feature Platform
Feature Platform
ML Platform

Pinterest’s data graph: Pin/Image/Board/User...
xJoin
pin’s text
image
info
video
info
texts
text
languages
text
scores
SEO
signa
l
link
languagelink
country
link perf
link scores
safe
search
spam
visual
signal
catvec_v0
pin’s catvec_v0
catvec_v1
pin’s catvec_v1
topicvec_v4
pin’s topicvec_v4
country
vecs
text
tokens
landing
page
annot_embedding v3
annotation_v2
annotation_v3
annotation_v4
Feature Platform - Today

code
module
developer
retrieval API, serving, acl, ...
offline consumers
(ML model training)
online consumers
(ML model serving)
Signal Access & Serving
spec
metadata
code
module
developer
spec
metadata
code
module
developer
spec
metadata
Galaxy: next-gen feature platform
* incremental dataflow execution engine
* signal data store (“column”-partitioned) and metadata repo (registry, stats)
* dependency management
* governance: enforcement & tracking
Metadata-driven framework & dev API
ML Platform
BDP BDP

ML Platform
Feature Platform
ML Platform

Response prediction ML
Serving
TrainingProfiles
Users, Pins, Boards
Logs
events
content

Response Prediction Use Cases at Pinterest
● Discovery
○ Home Feed: time-ordered following feed to ML based recommendation feed
○ Related Pins, Search: heuristic to ML ranking
● Ads
○ gCTR, CPI, CVR
● Growth
○ Notifications, NUX topics
● Content
○ Content comprehension
● Shopping
○ CTR prediction
● Protect
○ Spam & Porn, ATO
● … ...

Response prediction ML at Pinterest
Surfaces 2014:
Home feed
ranking;
Ads ranking
2015:
Related Pins
ranking
2016:
Search
ranking;
Notifications
ranking
2017:
Spam
detection
2018:
NUX topics;
Ads retrieval
Scale < 10 serving
hosts;
Training on
laptop
2500+ serving
hosts;
Training on
clusters

Configuration
Data
Verification
Feature Extraction
Process
Management Tools
Data
Collection
ML
Code Analytics Tools
Machine
Resource
Management
Serving
Infrastructure
Monitoring
&
Alerting
Hidden Technical Debt in Machine Learning Systems
David Sculley et al., Google, NIPS 2015

Much more complex in practice
Learner 1
Parameter
Autotuning
Serving &
Logging
Automation
Feature
Extraction 1
Related Pins Ads Home Feed
Learner 2
Data
Monitoring
Serving &
Logging
Automation
Feature
Extraction 2
Learner 3
Data
Monitoring
Serving &
Logging
Automation
Feature
Extraction 3
Distributed
Training
Distributed
Training
Similar components, no sharing!
Incomplete stacks

Unified ML Platform
Learner
Parameter
Autotuning
Serving &
Logging
Automation
Feature
Extraction
Related Pins Ads Home Feed
Data
Monitoring
Distributed
Training
Client teams focus on business problems, not infra problems.
Search
NUX Topic Picker
Notifications
New use cases
Platform team specializes in
infra problems.
Quick to build new
ML applications.

Unified Big Data ML Platform
● Speed & quality
● Single Use Case
○ 0 -> 1 made fast, easy and robust - create a ML model
to solve a complex problem
○ 1 -> N made automated - such a ML model continuously
trained, improved, and deployed
● Many Use Cases on the Platform
○ N -> N2 - most of ML models trained and served by the platform

24
Mission & Vision
Principles
Current Status
Key Technologies
Future Plan

Scorpion Training & Catwalk
Catwalk: enables running training jobs on
distributed cluster
Tensorflow XGBoost
Mesos: Cluster resource
management (CPUs, RAM,
GPUs)
Kubernetes:
to replace Mesos in
2018
Scorpion Training
Abstracts user from specific trainer package used.
future: other
packages
runs on

Catwalk
Mesos
Master
Caffe GPU
SciPy
MXNet
KerasCaffe
TensorFlow
TFMesosServer
Param
Server
Update
gradients
Chronos/Aurora
TFMesos
TFMesos
Torch
TFMesosServer
Worker
TFMesosServer
Worker
Chronos/
Aurora
PinBall
Legend
Mesos Agents

Linchpin - Easy Feature Definition
Declarative language for using common
feature extraction logic.
● Single implementation for both serving
& training.
● Heavily optimized.
Generic "Match"
Implementation
Interest
Match
Annotation
Match
reuses
pin <- source(TAG="pin", OUTPUTS="p", TYPE="PinJoinRawData")
user <- source(TAG="user", OUTPUTS="u", TYPE="UserJoinRawData")
cat_match <- match(INPUTS=[user.u.categoryVec, pin.p.categoryVec],
MATCH_TYPE="COSINE_SIM")
topic_match <- match(INPUTS=[user.u.topicVec, pin.p.topicVec], ...)
features <- union(INPUTS=[cat_match, topic_match, ...])

Confidential
Corpus
Root
Query
understanding
Leaf Leaf Leaf
Searchable
doc
index
builder
index
Indexing
pipeline
model
training
pipeline
models
Cache
Mixer
Cache
Reranker
Feature log
Merger
corpus
Fresh
corpus
streaming
pipeline
index builder
fresh index
Fresh index
dispatcher
Perdoc
data
dispatc
her
Searchable
doc
Planner
Muse

Pixie: Graph walks
● The greatest asset of Pinterest is our pin-to-board graph
○ It captures relationships between pins (how objects are organized into collections)
○ Can be used to capture multiple different interactions: pins to boards, clicks by user,...
● We use Pixie for candidate generation: How to quickly go from 2B pins to 1k
pins so that ML models can then score each pin separately
● Represent user a (set of) pin(s) Q and do a random walk from Q:
○ Bias the walk towards fresh pins, Pins in the local user’s language, Pins that males/females like

32
Mission & Vision
Principles
Current Status
Key Technologies
Future Plan

● [Product Enablement] Streaming engines
○ Spark Structured Streaming
○ Flink
○ … ...
● [Scalability] Spinner - next gen workflow engine
● [Performance] Hive on Tez
● [Efficiency] Hadoop auto-scaling
● [Future Proofing] Spark on Kubernetes
● [Future Proofing] Hadoop 3.0
Big Data Platform

ML Platform
Learner
Model Eval &
Comparison
Data
Monitoring
Feature
Analysis
Parameter
Autotunin
g
Model
Serving
Logging
Developer Frontend
off-the-shelf
solutions:
Tensorflow ...
Scorpion
Serving
Scorpion
Training
Incremental & Real-Time Training Automation
Model
Deploy
Linchpin DSL
Model Version
Management
Feature
Extraction
Real-time
Feature Sources
Counting
Service
ML Serving Systems
ML Training Platform
Team key:
Model Runtime
Validation

Key Learnings
● Unified big data ML platform greatly accelerates
product innovations
● Data lineage, quality and democracy are vital to
organization scalability
● Speed, quality & delightful-to-use

Pinterest - Big Data Machine Learning Platform at Pinterest

Pinterest - Big Data Machine Learning Platform at Pinterest

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Pinterest - Big Data Machine Learning Platform at Pinterest

Ähnlich wie Pinterest - Big Data Machine Learning Platform at Pinterest (20)

Mehr von Alluxio, Inc.

Mehr von Alluxio, Inc. (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Pinterest - Big Data Machine Learning Platform at Pinterest