SlideShare ist ein Scribd-Unternehmen logo
1 von 38
Downloaden Sie, um offline zu lesen
Big Data ML Platform at Pinterest
Yongsheng Wu
Pinterest: pinterest.com/yswu
LinkedIn: linkedin.com/in/yongshengwu
Twitter: @yswu
06/17/2019
Pinterest :
The World’s Catalog of Ideas
Mission
Help people discover and do
what they love.
Scale@Pinterest
Service Scale
• 300M+ MAUs
• 120B+ Pins
• 3B+ Boards
Big Data Scale
• 300+ PB on S3
• 6000+ Hive/Hadoop nodes
• 400+ Presto nodes
• 1000+ Spark nodes
Mission & Vision
Principles
Current Status
Key Technologies
Future Plan
Mission
Provide a highly scalable, reliable, secure, performant, efficient and
delightful-to-use big data and machine learning platform to enable rapid
product innovation and help make Pinterest a thriving business.
Vision
A big data and machine learning platform at scale enables every single
engineer at Pinterest to derive trustworthy, actionable insights and
apply ML to solve complex problems with ease and confidence.
Mission & Vision
Principles
Current Status
Key Technologies
Future Plan
Principles
● Put engineers first - make the platform delightful-to-use for all
engineers at Pinterest
● Keep it simple, get it right - build a simple yet sufficient
platform
● Enable speed and quality - enable all engineers at Pinterest to
move fast with scalable, reliable, secure, performant and efficient
solutions made easy by the platform
● Build with reusability and for reusability - embrace open
source technology, build with lego blocks and provide lego blocks to
all engineers at Pinterest
9
Mission & Vision
Principles
Current Status
Key Technologies
Future Plan
Big Data Platform
Big Data PlatformBig Data Platform
Feature Platform
ML Platform
Big Data Platform
Feature Platform
Big Data PlatformBig Data Platform
Feature Platform
ML Platform
Pinterest’s data graph: Pin/Image/Board/User...
xJoin
pin’s text
image
info
video
info
texts
text
languages
text
scores
SEO
signa
l
link
languagelink
country
link perf
link scores
safe
search
spam
visual
signal
catvec_v0
pin’s catvec_v0
catvec_v1
pin’s catvec_v1
topicvec_v4
pin’s topicvec_v4
country
vecs
text
tokens
landing
page
annot_embedding v3
annotation_v2
annotation_v3
annotation_v4
Feature Platform - Today
code
module
developer
retrieval API, serving, acl, ...
offline consumers
(ML model training)
online consumers
(ML model serving)
Signal Access & Serving
spec
metadata
code
module
developer
spec
metadata
code
module
developer
spec
metadata
Galaxy: next-gen feature platform
* incremental dataflow execution engine
* signal data store (“column”-partitioned) and metadata repo (registry, stats)
* dependency management
* governance: enforcement & tracking
Metadata-driven framework & dev API
ML Platform
BDP BDP
ML Platform
Big Data PlatformBig Data Platform
Feature Platform
ML Platform
Response prediction ML
Serving
TrainingProfiles
Users, Pins, Boards
Logs
events
content
Visual ML
Response Prediction Use Cases at Pinterest
● Discovery
○ Home Feed: time-ordered following feed to ML based recommendation feed
○ Related Pins, Search: heuristic to ML ranking
● Ads
○ gCTR, CPI, CVR
● Growth
○ Notifications, NUX topics
● Content
○ Content comprehension
● Shopping
○ CTR prediction
● Protect
○ Spam & Porn, ATO
● … ...
Response prediction ML at Pinterest
Surfaces 2014:
Home feed
ranking;
Ads ranking
2015:
Related Pins
ranking
2016:
Search
ranking;
Notifications
ranking
2017:
Spam
detection
2018:
NUX topics;
Ads retrieval
Scale < 10 serving
hosts;
Training on
laptop
2500+ serving
hosts;
Training on
clusters
Configuration
Data
Verification
Feature Extraction
Process
Management Tools
Data
Collection
ML
Code Analytics Tools
Machine
Resource
Management
Serving
Infrastructure
Monitoring
&
Alerting
Hidden Technical Debt in Machine Learning Systems
David Sculley et al., Google, NIPS 2015
Much more complex in practice
Learner 1
Parameter
Autotuning
Serving &
Logging
Automation
Feature
Extraction 1
Related Pins Ads Home Feed
Learner 2
Data
Monitoring
Serving &
Logging
Automation
Feature
Extraction 2
Learner 3
Data
Monitoring
Serving &
Logging
Automation
Feature
Extraction 3
Distributed
Training
Distributed
Training
Similar components, no sharing!
Incomplete stacks
Unified ML Platform
Learner
Parameter
Autotuning
Serving &
Logging
Automation
Feature
Extraction
Related Pins Ads Home Feed
Data
Monitoring
Distributed
Training
Client teams focus on business problems, not infra problems.
Search
NUX Topic Picker
Notifications
New use cases
Platform team specializes in
infra problems.
Quick to build new
ML applications.
Unified Big Data ML Platform
● Speed & quality
● Single Use Case
○ 0 -> 1 made fast, easy and robust - create a ML model
to solve a complex problem
○ 1 -> N made automated - such a ML model continuously
trained, improved, and deployed
● Many Use Cases on the Platform
○ N -> N2 - most of ML models trained and served by the platform
24
Mission & Vision
Principles
Current Status
Key Technologies
Future Plan
Scorpion Training & Catwalk
Catwalk: enables running training jobs on
distributed cluster
Tensorflow XGBoost
Mesos: Cluster resource
management (CPUs, RAM,
GPUs)
Kubernetes:
to replace Mesos in
2018
Scorpion Training
Abstracts user from specific trainer package used.
future: other
packages
runs on
Catwalk
Mesos
Master
Caffe GPU
SciPy
MXNet
KerasCaffe
TensorFlow
TFMesosServer
Param
Server
Update
gradients
Chronos/Aurora
TFMesos
TFMesos
Torch
TFMesosServer
Worker
TFMesosServer
Worker
Chronos/
Aurora
PinBall
Legend
Mesos Agents
Scorpion Serving
Linchpin - Easy Feature Definition
Declarative language for using common
feature extraction logic.
● Single implementation for both serving
& training.
● Heavily optimized.
Generic "Match"
Implementation
Interest
Match
Annotation
Match
reuses
pin <- source(TAG="pin", OUTPUTS="p", TYPE="PinJoinRawData")
user <- source(TAG="user", OUTPUTS="u", TYPE="UserJoinRawData")
cat_match <- match(INPUTS=[user.u.categoryVec, pin.p.categoryVec],
MATCH_TYPE="COSINE_SIM")
topic_match <- match(INPUTS=[user.u.topicVec, pin.p.topicVec], ...)
features <- union(INPUTS=[cat_match, topic_match, ...])
Confidential
Corpus
Root
Query
understanding
Leaf Leaf Leaf
Searchable
doc
index
builder
index
Indexing
pipeline
model
training
pipeline
models
Cache
Mixer
Cache
Reranker
Feature log
Merger
corpus
Fresh
corpus
streaming
pipeline
index builder
fresh index
Fresh index
dispatcher
Perdoc
data
dispatc
her
Searchable
doc
Planner
Muse
Pixie: Graph walks
● The greatest asset of Pinterest is our pin-to-board graph
○ It captures relationships between pins (how objects are organized into collections)
○ Can be used to capture multiple different interactions: pins to boards, clicks by user,...
● We use Pixie for candidate generation: How to quickly go from 2B pins to 1k
pins so that ML models can then score each pin separately
● Represent user a (set of) pin(s) Q and do a random walk from Q:
○ Bias the walk towards fresh pins, Pins in the local user’s language, Pins that males/females like
Pixie Architecture Diagram
32
Mission & Vision
Principles
Current Status
Key Technologies
Future Plan
● [Product Enablement] Streaming engines
○ Spark Structured Streaming
○ Flink
○ … ...
● [Scalability] Spinner - next gen workflow engine
● [Performance] Hive on Tez
● [Efficiency] Hadoop auto-scaling
● [Future Proofing] Spark on Kubernetes
● [Future Proofing] Hadoop 3.0
Big Data Platform
code
module
developer
retrieval API, serving, acl, ...
offline consumers
(ML model training)
online consumers
(ML model serving)
Signal Access & Serving
spec
metadata
code
module
developer
spec
metadata
code
module
developer
spec
metadata
Galaxy: next-gen feature platform
* incremental dataflow execution engine
* signal data store (“column”-partitioned) and metadata repo (registry, stats)
* dependency management
* governance: enforcement & tracking
Metadata-driven framework & dev API
ML Platform
BDP BDP
ML Platform
Learner
Model Eval &
Comparison
Data
Monitoring
Feature
Analysis
Parameter
Autotunin
g
Model
Serving
Logging
Developer Frontend
off-the-shelf
solutions:
Tensorflow ...
Scorpion
Serving
Scorpion
Training
Incremental & Real-Time Training Automation
Model
Deploy
Linchpin DSL
Model Version
Management
Feature
Extraction
Real-time
Feature Sources
Counting
Service
ML Serving Systems
ML Training Platform
Team key:
Model Runtime
Validation
Mission & Vision
Principles
Current Status
Key Technologies
Future Plan
Key Learnings
● Unified big data ML platform greatly accelerates
product innovations
● Data lineage, quality and democracy are vital to
organization scalability
● Speed, quality & delightful-to-use
Pinterest - Big Data Machine Learning Platform at Pinterest

Weitere ähnliche Inhalte

Was ist angesagt?

Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesFlink Forward
 
Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com confluent
 
Big Data Platform at Pinterest
Big Data Platform at PinterestBig Data Platform at Pinterest
Big Data Platform at PinterestQubole
 
Seamless MLOps with Seldon and MLflow
Seamless MLOps with Seldon and MLflowSeamless MLOps with Seldon and MLflow
Seamless MLOps with Seldon and MLflowDatabricks
 
Building a Virtual Data Lake with Apache Arrow
Building a Virtual Data Lake with Apache ArrowBuilding a Virtual Data Lake with Apache Arrow
Building a Virtual Data Lake with Apache ArrowDremio Corporation
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
 
Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...
Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...
Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...Seattle Apache Flink Meetup
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...HostedbyConfluent
 
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...Khai Tran
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Lucidworks
 
Storing 16 Bytes at Scale
Storing 16 Bytes at ScaleStoring 16 Bytes at Scale
Storing 16 Bytes at ScaleFabian Reinartz
 
Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3...
Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3...Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3...
Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3...confluent
 
Simplifying Model Management with MLflow
Simplifying Model Management with MLflowSimplifying Model Management with MLflow
Simplifying Model Management with MLflowDatabricks
 
Apache Flink @ NYC Flink Meetup
Apache Flink @ NYC Flink MeetupApache Flink @ NYC Flink Meetup
Apache Flink @ NYC Flink MeetupStephan Ewen
 
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa... Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...Databricks
 
How to Build a ML Platform Efficiently Using Open-Source
How to Build a ML Platform Efficiently Using Open-SourceHow to Build a ML Platform Efficiently Using Open-Source
How to Build a ML Platform Efficiently Using Open-SourceDatabricks
 

Was ist angesagt? (20)

Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com
 
Fluent Bit: Log Forwarding at Scale
Fluent Bit: Log Forwarding at ScaleFluent Bit: Log Forwarding at Scale
Fluent Bit: Log Forwarding at Scale
 
Big Data Platform at Pinterest
Big Data Platform at PinterestBig Data Platform at Pinterest
Big Data Platform at Pinterest
 
Seamless MLOps with Seldon and MLflow
Seamless MLOps with Seldon and MLflowSeamless MLOps with Seldon and MLflow
Seamless MLOps with Seldon and MLflow
 
Building a Virtual Data Lake with Apache Arrow
Building a Virtual Data Lake with Apache ArrowBuilding a Virtual Data Lake with Apache Arrow
Building a Virtual Data Lake with Apache Arrow
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...
Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...
Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
 
Kibana overview
Kibana overviewKibana overview
Kibana overview
 
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Amazon Kinesis
Amazon KinesisAmazon Kinesis
Amazon Kinesis
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
 
Storing 16 Bytes at Scale
Storing 16 Bytes at ScaleStoring 16 Bytes at Scale
Storing 16 Bytes at Scale
 
Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3...
Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3...Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3...
Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3...
 
Simplifying Model Management with MLflow
Simplifying Model Management with MLflowSimplifying Model Management with MLflow
Simplifying Model Management with MLflow
 
Apache Flink @ NYC Flink Meetup
Apache Flink @ NYC Flink MeetupApache Flink @ NYC Flink Meetup
Apache Flink @ NYC Flink Meetup
 
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa... Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 
How to Build a ML Platform Efficiently Using Open-Source
How to Build a ML Platform Efficiently Using Open-SourceHow to Build a ML Platform Efficiently Using Open-Source
How to Build a ML Platform Efficiently Using Open-Source
 

Ähnlich wie Pinterest - Big Data Machine Learning Platform at Pinterest

Empower customer success at LinkedIn with advanced analytics and great visual...
Empower customer success at LinkedIn with advanced analytics and great visual...Empower customer success at LinkedIn with advanced analytics and great visual...
Empower customer success at LinkedIn with advanced analytics and great visual...Michael Li
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Shirshanka Das
 
Accelerate ML Deployment with H2O Driverless AI on AWS
Accelerate ML Deployment with H2O Driverless AI on AWSAccelerate ML Deployment with H2O Driverless AI on AWS
Accelerate ML Deployment with H2O Driverless AI on AWSSri Ambati
 
Building successful data science teams
Building successful data science teamsBuilding successful data science teams
Building successful data science teamsVenkatesh Umaashankar
 
Salesforce Architect Group, Frederick, United States July 2023 - Generative A...
Salesforce Architect Group, Frederick, United States July 2023 - Generative A...Salesforce Architect Group, Frederick, United States July 2023 - Generative A...
Salesforce Architect Group, Frederick, United States July 2023 - Generative A...NadinaLisbon1
 
Machine Learning with GraphLab Create
Machine Learning with GraphLab CreateMachine Learning with GraphLab Create
Machine Learning with GraphLab CreateTuri, Inc.
 
The Most-Awaited Data Science Career Track Is Here!.pptx
The Most-Awaited Data Science Career Track Is Here!.pptxThe Most-Awaited Data Science Career Track Is Here!.pptx
The Most-Awaited Data Science Career Track Is Here!.pptxSynergisticIT
 
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine LearningData Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine LearningKai Wähner
 
Building Intelligent Apps with MongoDB and Google Cloud - Jane Fine
Building Intelligent Apps with MongoDB and Google Cloud - Jane FineBuilding Intelligent Apps with MongoDB and Google Cloud - Jane Fine
Building Intelligent Apps with MongoDB and Google Cloud - Jane FineMongoDB
 
Building an enterprise Natural Language Search Engine with ElasticSearch and ...
Building an enterprise Natural Language Search Engine with ElasticSearch and ...Building an enterprise Natural Language Search Engine with ElasticSearch and ...
Building an enterprise Natural Language Search Engine with ElasticSearch and ...Debmalya Biswas
 
Mohit Kalra 25th August
Mohit Kalra 25th AugustMohit Kalra 25th August
Mohit Kalra 25th Augustmdk8989
 
Big Data for Data Scientists - Info Session
Big Data for Data Scientists - Info SessionBig Data for Data Scientists - Info Session
Big Data for Data Scientists - Info SessionWeCloudData
 
Using Data Science to Build an End-to-End Recommendation System
Using Data Science to Build an End-to-End Recommendation SystemUsing Data Science to Build an End-to-End Recommendation System
Using Data Science to Build an End-to-End Recommendation SystemVMware Tanzu
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabszekeLabs Technologies
 
Prithvi Prabhu + Shivam Bansal, H2O.ai - Building Blocks for AI Applications ...
Prithvi Prabhu + Shivam Bansal, H2O.ai - Building Blocks for AI Applications ...Prithvi Prabhu + Shivam Bansal, H2O.ai - Building Blocks for AI Applications ...
Prithvi Prabhu + Shivam Bansal, H2O.ai - Building Blocks for AI Applications ...Sri Ambati
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformVMware Tanzu
 
Power BI storytelling 101
Power BI storytelling 101Power BI storytelling 101
Power BI storytelling 101Ida Bergum
 
Bluegranite AA Webinar FINAL 28JUN16
Bluegranite AA Webinar FINAL 28JUN16Bluegranite AA Webinar FINAL 28JUN16
Bluegranite AA Webinar FINAL 28JUN16Andy Lathrop
 
Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workfl...
Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workfl...Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workfl...
Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workfl...Matt Stubbs
 

Ähnlich wie Pinterest - Big Data Machine Learning Platform at Pinterest (20)

Are API Services Taking Over All the Interesting Data Science Problems?
Are API Services Taking Over All the Interesting Data Science Problems?Are API Services Taking Over All the Interesting Data Science Problems?
Are API Services Taking Over All the Interesting Data Science Problems?
 
Empower customer success at LinkedIn with advanced analytics and great visual...
Empower customer success at LinkedIn with advanced analytics and great visual...Empower customer success at LinkedIn with advanced analytics and great visual...
Empower customer success at LinkedIn with advanced analytics and great visual...
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
 
Accelerate ML Deployment with H2O Driverless AI on AWS
Accelerate ML Deployment with H2O Driverless AI on AWSAccelerate ML Deployment with H2O Driverless AI on AWS
Accelerate ML Deployment with H2O Driverless AI on AWS
 
Building successful data science teams
Building successful data science teamsBuilding successful data science teams
Building successful data science teams
 
Salesforce Architect Group, Frederick, United States July 2023 - Generative A...
Salesforce Architect Group, Frederick, United States July 2023 - Generative A...Salesforce Architect Group, Frederick, United States July 2023 - Generative A...
Salesforce Architect Group, Frederick, United States July 2023 - Generative A...
 
Machine Learning with GraphLab Create
Machine Learning with GraphLab CreateMachine Learning with GraphLab Create
Machine Learning with GraphLab Create
 
The Most-Awaited Data Science Career Track Is Here!.pptx
The Most-Awaited Data Science Career Track Is Here!.pptxThe Most-Awaited Data Science Career Track Is Here!.pptx
The Most-Awaited Data Science Career Track Is Here!.pptx
 
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine LearningData Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
 
Building Intelligent Apps with MongoDB and Google Cloud - Jane Fine
Building Intelligent Apps with MongoDB and Google Cloud - Jane FineBuilding Intelligent Apps with MongoDB and Google Cloud - Jane Fine
Building Intelligent Apps with MongoDB and Google Cloud - Jane Fine
 
Building an enterprise Natural Language Search Engine with ElasticSearch and ...
Building an enterprise Natural Language Search Engine with ElasticSearch and ...Building an enterprise Natural Language Search Engine with ElasticSearch and ...
Building an enterprise Natural Language Search Engine with ElasticSearch and ...
 
Mohit Kalra 25th August
Mohit Kalra 25th AugustMohit Kalra 25th August
Mohit Kalra 25th August
 
Big Data for Data Scientists - Info Session
Big Data for Data Scientists - Info SessionBig Data for Data Scientists - Info Session
Big Data for Data Scientists - Info Session
 
Using Data Science to Build an End-to-End Recommendation System
Using Data Science to Build an End-to-End Recommendation SystemUsing Data Science to Build an End-to-End Recommendation System
Using Data Science to Build an End-to-End Recommendation System
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabs
 
Prithvi Prabhu + Shivam Bansal, H2O.ai - Building Blocks for AI Applications ...
Prithvi Prabhu + Shivam Bansal, H2O.ai - Building Blocks for AI Applications ...Prithvi Prabhu + Shivam Bansal, H2O.ai - Building Blocks for AI Applications ...
Prithvi Prabhu + Shivam Bansal, H2O.ai - Building Blocks for AI Applications ...
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data Platform
 
Power BI storytelling 101
Power BI storytelling 101Power BI storytelling 101
Power BI storytelling 101
 
Bluegranite AA Webinar FINAL 28JUN16
Bluegranite AA Webinar FINAL 28JUN16Bluegranite AA Webinar FINAL 28JUN16
Bluegranite AA Webinar FINAL 28JUN16
 
Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workfl...
Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workfl...Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workfl...
Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workfl...
 

Mehr von Alluxio, Inc.

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Optimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioOptimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioAlluxio, Inc.
 
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingSpeed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingAlluxio, Inc.
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLAlluxio, Inc.
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio, Inc.
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...Alluxio, Inc.
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionAlluxio, Inc.
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeAlluxio, Inc.
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudAlluxio, Inc.
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderAlluxio, Inc.
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionAlluxio, Inc.
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio, Inc.
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...Alluxio, Inc.
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAlluxio, Inc.
 
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...Alluxio, Inc.
 
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...Alluxio, Inc.
 
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAlluxio, Inc.
 
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAlluxio, Inc.
 
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWSAlluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWSAlluxio, Inc.
 

Mehr von Alluxio, Inc. (20)

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Optimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioOptimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with Alluxio
 
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingSpeed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio Caching
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet Reader
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage Evolution
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI Era
 
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
 
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
 
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
 
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
 
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWSAlluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
 

Kürzlich hochgeladen

%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...chiefasafspells
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxAnnaArtyushina1
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburgmasabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationJuha-Pekka Tolvanen
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...masabamasaba
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 

Kürzlich hochgeladen (20)

%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 

Pinterest - Big Data Machine Learning Platform at Pinterest

  • 1. Big Data ML Platform at Pinterest Yongsheng Wu Pinterest: pinterest.com/yswu LinkedIn: linkedin.com/in/yongshengwu Twitter: @yswu 06/17/2019
  • 2. Pinterest : The World’s Catalog of Ideas
  • 3. Mission Help people discover and do what they love.
  • 4. Scale@Pinterest Service Scale • 300M+ MAUs • 120B+ Pins • 3B+ Boards Big Data Scale • 300+ PB on S3 • 6000+ Hive/Hadoop nodes • 400+ Presto nodes • 1000+ Spark nodes
  • 5. Mission & Vision Principles Current Status Key Technologies Future Plan
  • 6. Mission Provide a highly scalable, reliable, secure, performant, efficient and delightful-to-use big data and machine learning platform to enable rapid product innovation and help make Pinterest a thriving business. Vision A big data and machine learning platform at scale enables every single engineer at Pinterest to derive trustworthy, actionable insights and apply ML to solve complex problems with ease and confidence.
  • 7. Mission & Vision Principles Current Status Key Technologies Future Plan
  • 8. Principles ● Put engineers first - make the platform delightful-to-use for all engineers at Pinterest ● Keep it simple, get it right - build a simple yet sufficient platform ● Enable speed and quality - enable all engineers at Pinterest to move fast with scalable, reliable, secure, performant and efficient solutions made easy by the platform ● Build with reusability and for reusability - embrace open source technology, build with lego blocks and provide lego blocks to all engineers at Pinterest
  • 9. 9 Mission & Vision Principles Current Status Key Technologies Future Plan
  • 10. Big Data Platform Big Data PlatformBig Data Platform Feature Platform ML Platform
  • 12. Feature Platform Big Data PlatformBig Data Platform Feature Platform ML Platform
  • 13. Pinterest’s data graph: Pin/Image/Board/User... xJoin pin’s text image info video info texts text languages text scores SEO signa l link languagelink country link perf link scores safe search spam visual signal catvec_v0 pin’s catvec_v0 catvec_v1 pin’s catvec_v1 topicvec_v4 pin’s topicvec_v4 country vecs text tokens landing page annot_embedding v3 annotation_v2 annotation_v3 annotation_v4 Feature Platform - Today
  • 14. code module developer retrieval API, serving, acl, ... offline consumers (ML model training) online consumers (ML model serving) Signal Access & Serving spec metadata code module developer spec metadata code module developer spec metadata Galaxy: next-gen feature platform * incremental dataflow execution engine * signal data store (“column”-partitioned) and metadata repo (registry, stats) * dependency management * governance: enforcement & tracking Metadata-driven framework & dev API ML Platform BDP BDP
  • 15. ML Platform Big Data PlatformBig Data Platform Feature Platform ML Platform
  • 16. Response prediction ML Serving TrainingProfiles Users, Pins, Boards Logs events content
  • 18. Response Prediction Use Cases at Pinterest ● Discovery ○ Home Feed: time-ordered following feed to ML based recommendation feed ○ Related Pins, Search: heuristic to ML ranking ● Ads ○ gCTR, CPI, CVR ● Growth ○ Notifications, NUX topics ● Content ○ Content comprehension ● Shopping ○ CTR prediction ● Protect ○ Spam & Porn, ATO ● … ...
  • 19. Response prediction ML at Pinterest Surfaces 2014: Home feed ranking; Ads ranking 2015: Related Pins ranking 2016: Search ranking; Notifications ranking 2017: Spam detection 2018: NUX topics; Ads retrieval Scale < 10 serving hosts; Training on laptop 2500+ serving hosts; Training on clusters
  • 20. Configuration Data Verification Feature Extraction Process Management Tools Data Collection ML Code Analytics Tools Machine Resource Management Serving Infrastructure Monitoring & Alerting Hidden Technical Debt in Machine Learning Systems David Sculley et al., Google, NIPS 2015
  • 21. Much more complex in practice Learner 1 Parameter Autotuning Serving & Logging Automation Feature Extraction 1 Related Pins Ads Home Feed Learner 2 Data Monitoring Serving & Logging Automation Feature Extraction 2 Learner 3 Data Monitoring Serving & Logging Automation Feature Extraction 3 Distributed Training Distributed Training Similar components, no sharing! Incomplete stacks
  • 22. Unified ML Platform Learner Parameter Autotuning Serving & Logging Automation Feature Extraction Related Pins Ads Home Feed Data Monitoring Distributed Training Client teams focus on business problems, not infra problems. Search NUX Topic Picker Notifications New use cases Platform team specializes in infra problems. Quick to build new ML applications.
  • 23. Unified Big Data ML Platform ● Speed & quality ● Single Use Case ○ 0 -> 1 made fast, easy and robust - create a ML model to solve a complex problem ○ 1 -> N made automated - such a ML model continuously trained, improved, and deployed ● Many Use Cases on the Platform ○ N -> N2 - most of ML models trained and served by the platform
  • 24. 24 Mission & Vision Principles Current Status Key Technologies Future Plan
  • 25. Scorpion Training & Catwalk Catwalk: enables running training jobs on distributed cluster Tensorflow XGBoost Mesos: Cluster resource management (CPUs, RAM, GPUs) Kubernetes: to replace Mesos in 2018 Scorpion Training Abstracts user from specific trainer package used. future: other packages runs on
  • 28. Linchpin - Easy Feature Definition Declarative language for using common feature extraction logic. ● Single implementation for both serving & training. ● Heavily optimized. Generic "Match" Implementation Interest Match Annotation Match reuses pin <- source(TAG="pin", OUTPUTS="p", TYPE="PinJoinRawData") user <- source(TAG="user", OUTPUTS="u", TYPE="UserJoinRawData") cat_match <- match(INPUTS=[user.u.categoryVec, pin.p.categoryVec], MATCH_TYPE="COSINE_SIM") topic_match <- match(INPUTS=[user.u.topicVec, pin.p.topicVec], ...) features <- union(INPUTS=[cat_match, topic_match, ...])
  • 29. Confidential Corpus Root Query understanding Leaf Leaf Leaf Searchable doc index builder index Indexing pipeline model training pipeline models Cache Mixer Cache Reranker Feature log Merger corpus Fresh corpus streaming pipeline index builder fresh index Fresh index dispatcher Perdoc data dispatc her Searchable doc Planner Muse
  • 30. Pixie: Graph walks ● The greatest asset of Pinterest is our pin-to-board graph ○ It captures relationships between pins (how objects are organized into collections) ○ Can be used to capture multiple different interactions: pins to boards, clicks by user,... ● We use Pixie for candidate generation: How to quickly go from 2B pins to 1k pins so that ML models can then score each pin separately ● Represent user a (set of) pin(s) Q and do a random walk from Q: ○ Bias the walk towards fresh pins, Pins in the local user’s language, Pins that males/females like
  • 32. 32 Mission & Vision Principles Current Status Key Technologies Future Plan
  • 33. ● [Product Enablement] Streaming engines ○ Spark Structured Streaming ○ Flink ○ … ... ● [Scalability] Spinner - next gen workflow engine ● [Performance] Hive on Tez ● [Efficiency] Hadoop auto-scaling ● [Future Proofing] Spark on Kubernetes ● [Future Proofing] Hadoop 3.0 Big Data Platform
  • 34. code module developer retrieval API, serving, acl, ... offline consumers (ML model training) online consumers (ML model serving) Signal Access & Serving spec metadata code module developer spec metadata code module developer spec metadata Galaxy: next-gen feature platform * incremental dataflow execution engine * signal data store (“column”-partitioned) and metadata repo (registry, stats) * dependency management * governance: enforcement & tracking Metadata-driven framework & dev API ML Platform BDP BDP
  • 35. ML Platform Learner Model Eval & Comparison Data Monitoring Feature Analysis Parameter Autotunin g Model Serving Logging Developer Frontend off-the-shelf solutions: Tensorflow ... Scorpion Serving Scorpion Training Incremental & Real-Time Training Automation Model Deploy Linchpin DSL Model Version Management Feature Extraction Real-time Feature Sources Counting Service ML Serving Systems ML Training Platform Team key: Model Runtime Validation
  • 36. Mission & Vision Principles Current Status Key Technologies Future Plan
  • 37. Key Learnings ● Unified big data ML platform greatly accelerates product innovations ● Data lineage, quality and democracy are vital to organization scalability ● Speed, quality & delightful-to-use