SlideShare ist ein Scribd-Unternehmen logo
1 von 64
Scaling
Pinterest
Marty Weiner
Cloud Ninja

Yash Nelapati
Ascii Artist

Monday, November 11, 13
Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations
/scaling-pinterest

InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Presented at QCon San Francisco
www.qconsf.com
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Evolution

Scaling Pinterest

Monday, November 11, 13
Growth
March 2010
Page views per day

Mar 2010

Scaling Pinterest

Monday, November 11, 13

Jan 2011

Jan 2012

May 2012
Growth
March 2010
Page views per day

Mar 2010

Scaling Pinterest

Monday, November 11, 13

Jan 2011

Jan 2012

May 2012
Growth
March 2010
Page views per day

¡
¡
¡
¡

RackSpace
1 small Web Engine
1 small MySQL DB
1 Engineer + 2 Founders

Mar 2010

Scaling Pinterest

Monday, November 11, 13

Jan 2011

Jan 2012

May 2012
Growth
March 2010

Scaling Pinterest

Monday, November 11, 13
Growth
March 2010

Scaling Pinterest

Monday, November 11, 13
Growth
January 2011
Page views per day

Mar 2010

Scaling Pinterest

Monday, November 11, 13

Jan 2011

Jan 2012
Growth
January 2011
Page views per day

Mar 2010

Scaling Pinterest

Monday, November 11, 13

Jan 2011

Jan 2012
Growth
January 2011
Page views per day

¡

Amazon EC2 + S3 +
CloudFront

¡
¡
¡

1 NGinX, 4 Web Engines
1 MySQL DB + 1 Read Slave
1 Task Queue + 2 Task
Processors

¡
¡

1 MongoDB
2 Engineers + 2 Founders
Mar 2010

Scaling Pinterest

Monday, November 11, 13

Jan 2011

Jan 2012
Scaling Pinterest

Monday, November 11, 13
Growth
September 2011
Page views per day

Mar 2010

Scaling Pinterest

Monday, November 11, 13

Jan 2011

Jan 2012 May 2012
Growth
September 2011
Page views per day

Mar 2010

Scaling Pinterest

Monday, November 11, 13

Jan 2011

Jan 2012 May 2012
Growth
September 2011
Page views per day
¡
¡

Amazon EC2 + S3 + CloudFront
2 NGinX, 16 Web Engines + 2 API
Engines

¡

5 Functionally Sharded MySQL DB +
9 read slaves

¡
¡

4 Cassandra Nodes
15 Membase Nodes (3 separate
clusters)

¡
¡
¡
¡
¡
¡

8 Memcache Nodes
10 Redis Nodes
3 Task Routers + 4 Task Processors
4 Elastic Search Nodes
3 Mongo Clusters
3 Engineers (8 Total)

Scaling Pinterest

Monday, November 11, 13

Mar 2010

Jan 2011

Jan 2012 May 2012
It will fail. Keep it simple.

Scaling Pinterest

Monday, November 11, 13
If you’re the biggest user of a
technology, the challenges will
be greatly amplified

Scaling Pinterest

Monday, November 11, 13
Growth
January 2012

Scaling Pinterest

Monday, November 11, 13
Growth
April 2012
Page views per day

Mar 2010

Mar 2010

Scaling Pinterest

Monday, November 11, 13

Jan 2011

Jan 2012

May 2012
Growth
April 2012
Page views per day

Mar 2010

Mar 2010

Scaling Pinterest

Monday, November 11, 13

Jan 2011

Jan 2012

May 2012
Growth
April 2012
Page views per day
¡
¡
¡
¡

Amazon EC2 + S3 + Edge Cast
135 Web Engines + 75 API Engines
10 Service Instances
80 MySQL DBs (m1.xlarge) + 1 slave
each

¡
¡
¡

110 Redis Instances
60 Memcache Instances
2 Redis Task Manager + 60 Task

Mar 2010

Processors

¡

3rd party sharded Solr

Scaling Pinterest

Monday, November 11, 13

Mar 2010

Jan 2011

Jan 2012

May 2012
Growth
April 2012
Page views per day
¡

12 Engineers

¡
¡
¡
¡
¡

1 Data Infrastructure
1 Ops
2 Mobile
8 Generalists

10 Non-Engineers

Mar 2010

Mar 2010

Scaling Pinterest

Monday, November 11, 13

Jan 2011

Jan 2012

May 2012
Scaling Pinterest

Scaling Pinterest

Monday, November 11, 13
Growth
April 2013
Page views per day

April 2012

Scaling Pinterest

Monday, November 11, 13

April 2013
Growth
April 2013
Page views per day

April 2012

Scaling Pinterest

Monday, November 11, 13

April 2013
Growth
April 2013
¡
¡

Page views per day

Amazon EC2 + S3 + Edge Cast
400+ Web Engines + 400+ API
Engines

¡

70+ MySQL DBs (hi.4xlarge on SSDs)
+ 1 slave each

¡
¡
¡

100+ Redis Instances
230+ Memcache Instances
10 Redis Task Manager + 500 Task
Processors

¡

65+ Engineers (130+ total)
April 2012

Scaling Pinterest

Monday, November 11, 13

April 2013
Growth
April 2013
¡
¡

Page views per day

Amazon EC2 + S3 + Edge Cast
400+ Web Engines + 400+ API
Engines

¡

70+ MySQL DBs (hi.4xlarge on SSDs)
+ 1 slave each

¡
¡
¡

100+ Redis Instances
230+ Memcache Instances
10 Redis Task Manager + 500 Task
Processors

¡
¡
¡
¡
¡
¡
¡

65+ Engineers (130+ total)
8 services (80 instances)
Sharded Solr
20 HBase
12 Kafka + Azkabhan
8 Zookeeper Instances
12 Varnish
Scaling Pinterest

Monday, November 11, 13

April 2012

April 2013
Growth
April 2013
¡

65+ Engineers

¡
¡
¡
¡
¡
¡
¡
¡
¡
¡

Page views per day

7 Data Infrastructure + Science
7 Search and Discovery
9 Business and Platform
6 Spam, Abuse, Security
9 Web
9 Mobile
2 growth
10 Infrastructure
6 Ops

65+ Non-Engineers

Scaling Pinterest

Monday, November 11, 13

April 2012

April 2013
Scaling Pinterest

Monday, November 11, 13
Technologies

Scaling Pinterest

Monday, November 11, 13
Arch
Overview

ELB

Puppet
StatsD

Routing & Filtering
(Varnish)
Task Queue
(Redis)

Web App
(Python)

API App
(Python / JS / HTML)

Monit
Sensu

Task Processing
(Python/Pyres)

All connection pairings managed by ZooKeeper

MySQL Service
(Java/Finagle)

Images
(S3 + CDN)

Scaling Pinterest

Monday, November 11, 13

Memcache Mux
(Nutcracker)

Sharded
MySQL

Memcache

Follower Service
(Python/Thrift)

Feed Service
(Python/Thrift)

Redis

Search Service
(Python/Thrift)

HBase

Spam Service
(Python/Thrift)
Data
Pipeline

Web App
(Python)

API App
(Python)

Task Processing
(Python/Pyres)

Kafka

S3 Copier

Tripwire (Spam)

S3

Qubole

Pinball

Scaling Pinterest

Monday, November 11, 13

Redshift
Web App
NGinX

Website Rendering (x8)
(Python / JS / HTML)

API

Scaling Pinterest

Monday, November 11, 13
Our MySQL Sharding?
http://www.infoq.com/presentations/Pinterest

Scaling Pinterest

Monday, November 11, 13
Choosing
Your
Tech

Questions to ask
• Does it meet your needs?
• How mature is the product?
• Is it commonly used? Can you hire people who have used it?
• Is the community active?
• How robust is it to failure?
• How well does it scale? Will you be the biggest user?
• Does it have a good debugging tools? Profiler? Backup
software?
• Is the cost justified?

Scaling Pinterest

Monday, November 11, 13
Hosting

Why Amazon Web Services (AWS)?
• Variety of servers running Linux
• Very good peripherals: load balancing, DNS, map
reduce, basic security, and more
• Good reliability
• Very active dev community
• Not cheap, but...

Scaling Pinterest

Monday, November 11, 13
Hosting

Why Amazon Web Services (AWS)?
• Variety of servers running Linux
• Very good peripherals: load balancing, DNS, map
reduce, basic security, and more
• Good reliability
• Very active dev community
• Not cheap, but...
• New instances ready in seconds

Scaling Pinterest

Monday, November 11, 13
Hosting

AWS Usage
• Route 53 for DNS
• ELB for 1st tier load balance
• EC2 Ubuntu Linux
• Varnish layer
• All web, API, background appliances
• All services
• All databases and caches
• S3 for images, logs

Scaling Pinterest

Monday, November 11, 13
Code

Why Python?
• Extremely mature
• Well known and well liked
• Solid active community
• Very good libraries specifically targeted to web
development
• Effective rapid prototyping
• Open Source

Scaling Pinterest

Monday, November 11, 13
Code

Why Python?
• Extremely mature
• Well known and well liked
• Solid active community
• Very good libraries specifically targeted to web
development
• Effective rapid prototyping
• Open Source

Some Java and Go...
• Faster, lower variance response time
Scaling Pinterest

Monday, November 11, 13
Code

Python Usage
• All web backend, API, and related business logic
• Most services

Scaling Pinterest

Monday, November 11, 13
Code

Python Usage
• All web backend, API, and related business logic
• Most services

Java and Go Usage
• Varnish plugins
• Search indexers
• High frequency services (e.g., MySQL service)

Scaling Pinterest

Monday, November 11, 13
Production
Data

Why MySQL and Memcache?
• Extremely mature
• Well known and well liked
• (MySQL) Rarely catastrophic loss of data
• Response time to request rate increases linearly
• Very good software support: XtraBackup, Innotop, Maatkit
• Solid active community
• Open Source

Scaling Pinterest

Monday, November 11, 13
Production
Data

MySQL and Memcache Usage
• Storage / Caching of core data
• Users, boards, pins, comments, domains
• Mappings (e.g., users to boards, user likes, repin info)
• Legal compliance data

Scaling Pinterest

Monday, November 11, 13
Production
Data

Why Redis?
• Well known and well liked
• Active community
• Consistently good performance
• Variety of convenient and efficient data structures
• 3 Flavors of Persistence: Now, Snapshot, Never
• Open Source

Scaling Pinterest

Monday, November 11, 13
Production
Data

Redis Usage
• Follower data
• Configurations
• Public feed pin IDs
• Caching of various core mappings (e.g., board to pins)

Scaling Pinterest

Monday, November 11, 13
Production
Data

Why HBase?
• Small, but growing loyal community
• Difficult to hire for, but...
• Non-volatile, O(1), extremely fast and efficient storage
• Strong Hadoop integration
• Consistently good performance
• Used by Facebook (bigger than us)
• Seems to work well
• Open Source

Scaling Pinterest

Monday, November 11, 13
Production
Data

HBase Usage
• User feeds (pin IDs are pushed to feeds)
• Rich pin details
• Spam features
• User relationships to pins

Scaling Pinterest

Monday, November 11, 13
Production
Data

What happened to Cassandra,
Mongo, ES, and Membase?
• Does it meet your needs?
• How mature is the product?
• Is it commonly used? Can you hire people who have used it?
• Is the community active? Can you get help?
• How robust is it to failure?
• How well does it scale? Will you be the biggest user?
• Does it have a good debugging tools? Profiler? Backup
software?
• Is the cost justified?

Scaling Pinterest

Monday, November 11, 13
A 2nd chance...

Scaling Pinterest

Monday, November 11, 13
A 2nd
Chance

Stuff we could have done better
• Logging on day 1 (StatsD, Kafka, Map Reduce)
• Log every request, event, signup
• Basic analytics
• Recovery from data corruption or failure
• Alerting on day 1

Scaling Pinterest

Monday, November 11, 13
A 2nd
Chance

Stuff we could have done better
• Shard our MySQL storage much earlier
• Once you start relying on read slaves, start the
timebomb countdown
• We also fell into the NoSQL trap (Membase,
Cassandra, Mongo, etc)
• Pyres for background tasks day 1
• Hire technical operations eng earlier
• Chef / Puppet earlier
• Unit testing earlier (Jenkins for builds)

Scaling Pinterest

Monday, November 11, 13
A 2nd
Chance

Stuff we could have done better
• A/B testing earlier
• Decider on top of Zookeeper WATCH
• Progressive roll out
• Kill switches

Scaling Pinterest

Monday, November 11, 13
What’s
next?

Looking Forward
• Continually improve Pinner experience
• Help Pinners discover more of the things they love
• Better uptime and lower latency
• Faster development times
• Reduce spam and abuse
• Continually improve collaboration and build bigger,
better, faster products
• 180 Pinployees and beyond

Scaling Pinterest

Monday, November 11, 13
Have fun

Scaling Pinterest

Monday, November 11, 13
marty@pinterest.com
pinterest.com/martaaay

Monday, November 11, 13

yashh@pinterest.com
pinterest.com/yashh
marty@pinterest.com
pinterest.com/martaaay

Monday, November 11, 13

yashh@pinterest.com
pinterest.com/yashh
My 2nd
Chance

If I could do it all over again...
• Stronger ACID transactional guarantees across multiple
systems
• Currently have: sometimes A, best effort C, I, D,
no silent failure
• Want: sometimes A, eventual C, I, D, no silent
completion

Scaling Pinterest

Monday, November 11, 13
My 2nd
Chance

Transactional tasks
• All tasks become a dependency tree of repeatable
synchronous or asynchronous actions
• All actions must be repeatable
• Otherwise, must add repeatability
• All tasks get a unique transaction number
• Counters are tricky

Scaling Pinterest

Monday, November 11, 13
My 2nd
Chance

Transactional tasks
• All tasks become a dependency tree of repeatable
synchronous or asynchronous actions
• Sync actions are executed in order
• Async actions are executed in any order
• Repeat until successful or too many failures
• Too many failures -> put in per task failure queue
• Gives eventual C, I, D
• No silent completion and A require extra effort

Scaling Pinterest

Monday, November 11, 13
My 2nd
Chance

Transactional tasks example
• Pin create sync
• Write empty pin object
• Write pin ID to board, likes, user’s pins, clear caches
• Write pin object
• Pin not shown until pin object created -> Atomicity!

Scaling Pinterest

Monday, November 11, 13
My 2nd
Chance

Transactional tasks example
• Pin create async
• Write pin to required user feeds and public feeds
• Feeds are sorted sets. Reinsertion is okay.
• Send emails, Facebook Likes, Twitter Tweets
• Before send, check / record in temporary storage
-> Gives (temporary) repeatability

Scaling Pinterest

Monday, November 11, 13
Watch the video with slide synchronization on
InfoQ.com!
http://www.infoq.com/presentations/scalingpinterest

Weitere ähnliche Inhalte

Was ist angesagt?

Idera live 2021: Keynote Presentation The Future of Data is The Data Cloud b...
Idera live 2021:  Keynote Presentation The Future of Data is The Data Cloud b...Idera live 2021:  Keynote Presentation The Future of Data is The Data Cloud b...
Idera live 2021: Keynote Presentation The Future of Data is The Data Cloud b...
IDERA Software
 
Observability; a gentle introduction
Observability; a gentle introductionObservability; a gentle introduction
Observability; a gentle introduction
Bram Vogelaar
 
Lessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloudLessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloud
DataWorks Summit
 

Was ist angesagt? (20)

Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for Scale
 
ApacheCon Europe Big Data 2016 – Parquet in practice & detail
ApacheCon Europe Big Data 2016 – Parquet in practice & detailApacheCon Europe Big Data 2016 – Parquet in practice & detail
ApacheCon Europe Big Data 2016 – Parquet in practice & detail
 
Evolution from EDA to Data Mesh: Data in Motion
Evolution from EDA to Data Mesh: Data in MotionEvolution from EDA to Data Mesh: Data in Motion
Evolution from EDA to Data Mesh: Data in Motion
 
Idera live 2021: Keynote Presentation The Future of Data is The Data Cloud b...
Idera live 2021:  Keynote Presentation The Future of Data is The Data Cloud b...Idera live 2021:  Keynote Presentation The Future of Data is The Data Cloud b...
Idera live 2021: Keynote Presentation The Future of Data is The Data Cloud b...
 
MicroService architecture_&_Kubernetes
MicroService architecture_&_KubernetesMicroService architecture_&_Kubernetes
MicroService architecture_&_Kubernetes
 
Observability & Datadog
Observability & DatadogObservability & Datadog
Observability & Datadog
 
Combining logs, metrics, and traces for unified observability
Combining logs, metrics, and traces for unified observabilityCombining logs, metrics, and traces for unified observability
Combining logs, metrics, and traces for unified observability
 
RefCard API Architecture Strategy
RefCard API Architecture StrategyRefCard API Architecture Strategy
RefCard API Architecture Strategy
 
OpenTelemetry For Developers
OpenTelemetry For DevelopersOpenTelemetry For Developers
OpenTelemetry For Developers
 
Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com
 
Observability; a gentle introduction
Observability; a gentle introductionObservability; a gentle introduction
Observability; a gentle introduction
 
Get Intelligent with Metabase
Get Intelligent with MetabaseGet Intelligent with Metabase
Get Intelligent with Metabase
 
Data Pipline Observability meetup
Data Pipline Observability meetup Data Pipline Observability meetup
Data Pipline Observability meetup
 
Cluster-as-code. The Many Ways towards Kubernetes
Cluster-as-code. The Many Ways towards KubernetesCluster-as-code. The Many Ways towards Kubernetes
Cluster-as-code. The Many Ways towards Kubernetes
 
MicroServices with Containers, Kubernetes & ServiceMesh
MicroServices with Containers, Kubernetes & ServiceMeshMicroServices with Containers, Kubernetes & ServiceMesh
MicroServices with Containers, Kubernetes & ServiceMesh
 
Cassandra Operations at Netflix
Cassandra Operations at NetflixCassandra Operations at Netflix
Cassandra Operations at Netflix
 
How Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayHow Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per day
 
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and CloudHBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
 
Lessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloudLessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloud
 
How OpenShift SDN helps to automate
How OpenShift SDN helps to automateHow OpenShift SDN helps to automate
How OpenShift SDN helps to automate
 

Andere mochten auch

Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWSMigrating enterprise workloads to AWS
Migrating enterprise workloads to AWS
Tom Laszewski
 
Masteel Accounting
Masteel  AccountingMasteel  Accounting
Masteel Accounting
Devin Wong
 

Andere mochten auch (20)

Red Hat Storage Server Replication Past, Present, & Future
Red Hat Storage Server Replication Past, Present, & FutureRed Hat Storage Server Replication Past, Present, & Future
Red Hat Storage Server Replication Past, Present, & Future
 
(ARC204) Architecting Microsoft Workloads on AWS | AWS re:Invent 2014
(ARC204) Architecting Microsoft Workloads on AWS | AWS re:Invent 2014(ARC204) Architecting Microsoft Workloads on AWS | AWS re:Invent 2014
(ARC204) Architecting Microsoft Workloads on AWS | AWS re:Invent 2014
 
CloudFront DESIGN PATTERNS
CloudFront  DESIGN PATTERNSCloudFront  DESIGN PATTERNS
CloudFront DESIGN PATTERNS
 
Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWSMigrating enterprise workloads to AWS
Migrating enterprise workloads to AWS
 
Bcache and Aerospike
Bcache and AerospikeBcache and Aerospike
Bcache and Aerospike
 
(SDD424) Simplifying Scalable Distributed Applications Using DynamoDB Streams...
(SDD424) Simplifying Scalable Distributed Applications Using DynamoDB Streams...(SDD424) Simplifying Scalable Distributed Applications Using DynamoDB Streams...
(SDD424) Simplifying Scalable Distributed Applications Using DynamoDB Streams...
 
High Availability (HA) Explained - second edition
High Availability (HA) Explained - second editionHigh Availability (HA) Explained - second edition
High Availability (HA) Explained - second edition
 
Breaking IO Performance Barriers: Scalable Parallel File System for AWS
Breaking IO Performance Barriers: Scalable Parallel File System for AWSBreaking IO Performance Barriers: Scalable Parallel File System for AWS
Breaking IO Performance Barriers: Scalable Parallel File System for AWS
 
(ARC311) Extreme Availability for Mission-Critical Applications | AWS re:Inve...
(ARC311) Extreme Availability for Mission-Critical Applications | AWS re:Inve...(ARC311) Extreme Availability for Mission-Critical Applications | AWS re:Inve...
(ARC311) Extreme Availability for Mission-Critical Applications | AWS re:Inve...
 
AWS Webcast - On-Demand Video Streaming using Amazon CloudFront
AWS Webcast - On-Demand Video Streaming using Amazon CloudFront  AWS Webcast - On-Demand Video Streaming using Amazon CloudFront
AWS Webcast - On-Demand Video Streaming using Amazon CloudFront
 
Cloud Native Cost Optimization
Cloud Native Cost OptimizationCloud Native Cost Optimization
Cloud Native Cost Optimization
 
From Push Technology to Real-Time Messaging and WebSockets
From Push Technology to Real-Time Messaging and WebSocketsFrom Push Technology to Real-Time Messaging and WebSockets
From Push Technology to Real-Time Messaging and WebSockets
 
Datadog at NYCBUG
Datadog at NYCBUGDatadog at NYCBUG
Datadog at NYCBUG
 
Database index by Reema Gajjar
Database index by Reema GajjarDatabase index by Reema Gajjar
Database index by Reema Gajjar
 
Gripen: The Face of Success
Gripen: The Face of SuccessGripen: The Face of Success
Gripen: The Face of Success
 
Irwin Seating Lecture Room Brochure (2008 Edition)
Irwin Seating Lecture Room Brochure (2008 Edition)Irwin Seating Lecture Room Brochure (2008 Edition)
Irwin Seating Lecture Room Brochure (2008 Edition)
 
Portfolio
PortfolioPortfolio
Portfolio
 
Expertus 2010 Clipbook
Expertus 2010 ClipbookExpertus 2010 Clipbook
Expertus 2010 Clipbook
 
project management
project managementproject management
project management
 
Masteel Accounting
Masteel  AccountingMasteel  Accounting
Masteel Accounting
 

Ähnlich wie Scaling Pinterest

2 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.022 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.02
Valerie Akinson Brown
 
2 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.022 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.02
BigDataCamp
 
Dev ops lessons learned - Michael Collins
Dev ops lessons learned  - Michael CollinsDev ops lessons learned  - Michael Collins
Dev ops lessons learned - Michael Collins
Devopsdays
 

Ähnlich wie Scaling Pinterest (20)

Add Redis to Postgres to Make Your Microservices Go Boom!
Add Redis to Postgres to Make Your Microservices Go Boom!Add Redis to Postgres to Make Your Microservices Go Boom!
Add Redis to Postgres to Make Your Microservices Go Boom!
 
PostgreSQL and MySQL
PostgreSQL and MySQLPostgreSQL and MySQL
PostgreSQL and MySQL
 
City of Atlanta Oracle Application Footprint
City of Atlanta Oracle Application FootprintCity of Atlanta Oracle Application Footprint
City of Atlanta Oracle Application Footprint
 
Architecture and Design MySQL powered applications by Peter Zaitsev Meetup Sa...
Architecture and Design MySQL powered applications by Peter Zaitsev Meetup Sa...Architecture and Design MySQL powered applications by Peter Zaitsev Meetup Sa...
Architecture and Design MySQL powered applications by Peter Zaitsev Meetup Sa...
 
File-AID 10.2 – Value Today, Essential Tomorrow Webcast
File-AID 10.2 – Value Today, Essential Tomorrow WebcastFile-AID 10.2 – Value Today, Essential Tomorrow Webcast
File-AID 10.2 – Value Today, Essential Tomorrow Webcast
 
Data Platform in the Cloud
Data Platform in the CloudData Platform in the Cloud
Data Platform in the Cloud
 
DevOps Columbus Meetup Kickoff - Infrastructure as Code
DevOps Columbus Meetup Kickoff - Infrastructure as CodeDevOps Columbus Meetup Kickoff - Infrastructure as Code
DevOps Columbus Meetup Kickoff - Infrastructure as Code
 
16 months @ SoundCloud
16 months @ SoundCloud16 months @ SoundCloud
16 months @ SoundCloud
 
EBSCO Digital Transformation with AWS
EBSCO Digital Transformation with AWS EBSCO Digital Transformation with AWS
EBSCO Digital Transformation with AWS
 
OpenStack and Databases
OpenStack and DatabasesOpenStack and Databases
OpenStack and Databases
 
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander ZaitsevWebinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
 
2 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.022 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.02
 
2 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.022 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.02
 
2 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.022 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.02
 
Fast, Flexible Application Development with Oracle Database Cloud Service
Fast, Flexible Application Development with Oracle Database Cloud ServiceFast, Flexible Application Development with Oracle Database Cloud Service
Fast, Flexible Application Development with Oracle Database Cloud Service
 
Restlet: Building a multi-tenant API PaaS with DataStax Enterprise Search
Restlet: Building a multi-tenant API PaaS with DataStax Enterprise SearchRestlet: Building a multi-tenant API PaaS with DataStax Enterprise Search
Restlet: Building a multi-tenant API PaaS with DataStax Enterprise Search
 
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesYow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
 
Mean Stack - An Overview
Mean Stack - An OverviewMean Stack - An Overview
Mean Stack - An Overview
 
[판교에서 만나는 아마존웹서비스] Obama for America를 통해서 본 AWS에서의 데이터 분석
[판교에서 만나는 아마존웹서비스] Obama for America를 통해서 본 AWS에서의 데이터 분석 [판교에서 만나는 아마존웹서비스] Obama for America를 통해서 본 AWS에서의 데이터 분석
[판교에서 만나는 아마존웹서비스] Obama for America를 통해서 본 AWS에서의 데이터 분석
 
Dev ops lessons learned - Michael Collins
Dev ops lessons learned  - Michael CollinsDev ops lessons learned  - Michael Collins
Dev ops lessons learned - Michael Collins
 

Mehr von C4Media

Mehr von C4Media (20)

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live Video
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy Mobile
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java Applications
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No Keeper
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like Owners
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate Guide
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 

KĂźrzlich hochgeladen

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

KĂźrzlich hochgeladen (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 

Scaling Pinterest

  • 1. Scaling Pinterest Marty Weiner Cloud Ninja Yash Nelapati Ascii Artist Monday, November 11, 13
  • 2. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /scaling-pinterest InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month
  • 3. Presented at QCon San Francisco www.qconsf.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  • 5. Growth March 2010 Page views per day Mar 2010 Scaling Pinterest Monday, November 11, 13 Jan 2011 Jan 2012 May 2012
  • 6. Growth March 2010 Page views per day Mar 2010 Scaling Pinterest Monday, November 11, 13 Jan 2011 Jan 2012 May 2012
  • 7. Growth March 2010 Page views per day ¡ ¡ ¡ ¡ RackSpace 1 small Web Engine 1 small MySQL DB 1 Engineer + 2 Founders Mar 2010 Scaling Pinterest Monday, November 11, 13 Jan 2011 Jan 2012 May 2012
  • 10. Growth January 2011 Page views per day Mar 2010 Scaling Pinterest Monday, November 11, 13 Jan 2011 Jan 2012
  • 11. Growth January 2011 Page views per day Mar 2010 Scaling Pinterest Monday, November 11, 13 Jan 2011 Jan 2012
  • 12. Growth January 2011 Page views per day ¡ Amazon EC2 + S3 + CloudFront ¡ ¡ ¡ 1 NGinX, 4 Web Engines 1 MySQL DB + 1 Read Slave 1 Task Queue + 2 Task Processors ¡ ¡ 1 MongoDB 2 Engineers + 2 Founders Mar 2010 Scaling Pinterest Monday, November 11, 13 Jan 2011 Jan 2012
  • 14. Growth September 2011 Page views per day Mar 2010 Scaling Pinterest Monday, November 11, 13 Jan 2011 Jan 2012 May 2012
  • 15. Growth September 2011 Page views per day Mar 2010 Scaling Pinterest Monday, November 11, 13 Jan 2011 Jan 2012 May 2012
  • 16. Growth September 2011 Page views per day ¡ ¡ Amazon EC2 + S3 + CloudFront 2 NGinX, 16 Web Engines + 2 API Engines ¡ 5 Functionally Sharded MySQL DB + 9 read slaves ¡ ¡ 4 Cassandra Nodes 15 Membase Nodes (3 separate clusters) ¡ ¡ ¡ ¡ ¡ ¡ 8 Memcache Nodes 10 Redis Nodes 3 Task Routers + 4 Task Processors 4 Elastic Search Nodes 3 Mongo Clusters 3 Engineers (8 Total) Scaling Pinterest Monday, November 11, 13 Mar 2010 Jan 2011 Jan 2012 May 2012
  • 17. It will fail. Keep it simple. Scaling Pinterest Monday, November 11, 13
  • 18. If you’re the biggest user of a technology, the challenges will be greatly amplified Scaling Pinterest Monday, November 11, 13
  • 20. Growth April 2012 Page views per day Mar 2010 Mar 2010 Scaling Pinterest Monday, November 11, 13 Jan 2011 Jan 2012 May 2012
  • 21. Growth April 2012 Page views per day Mar 2010 Mar 2010 Scaling Pinterest Monday, November 11, 13 Jan 2011 Jan 2012 May 2012
  • 22. Growth April 2012 Page views per day ¡ ¡ ¡ ¡ Amazon EC2 + S3 + Edge Cast 135 Web Engines + 75 API Engines 10 Service Instances 80 MySQL DBs (m1.xlarge) + 1 slave each ¡ ¡ ¡ 110 Redis Instances 60 Memcache Instances 2 Redis Task Manager + 60 Task Mar 2010 Processors ¡ 3rd party sharded Solr Scaling Pinterest Monday, November 11, 13 Mar 2010 Jan 2011 Jan 2012 May 2012
  • 23. Growth April 2012 Page views per day ¡ 12 Engineers ¡ ¡ ¡ ¡ ¡ 1 Data Infrastructure 1 Ops 2 Mobile 8 Generalists 10 Non-Engineers Mar 2010 Mar 2010 Scaling Pinterest Monday, November 11, 13 Jan 2011 Jan 2012 May 2012
  • 25. Growth April 2013 Page views per day April 2012 Scaling Pinterest Monday, November 11, 13 April 2013
  • 26. Growth April 2013 Page views per day April 2012 Scaling Pinterest Monday, November 11, 13 April 2013
  • 27. Growth April 2013 ¡ ¡ Page views per day Amazon EC2 + S3 + Edge Cast 400+ Web Engines + 400+ API Engines ¡ 70+ MySQL DBs (hi.4xlarge on SSDs) + 1 slave each ¡ ¡ ¡ 100+ Redis Instances 230+ Memcache Instances 10 Redis Task Manager + 500 Task Processors ¡ 65+ Engineers (130+ total) April 2012 Scaling Pinterest Monday, November 11, 13 April 2013
  • 28. Growth April 2013 ¡ ¡ Page views per day Amazon EC2 + S3 + Edge Cast 400+ Web Engines + 400+ API Engines ¡ 70+ MySQL DBs (hi.4xlarge on SSDs) + 1 slave each ¡ ¡ ¡ 100+ Redis Instances 230+ Memcache Instances 10 Redis Task Manager + 500 Task Processors ¡ ¡ ¡ ¡ ¡ ¡ ¡ 65+ Engineers (130+ total) 8 services (80 instances) Sharded Solr 20 HBase 12 Kafka + Azkabhan 8 Zookeeper Instances 12 Varnish Scaling Pinterest Monday, November 11, 13 April 2012 April 2013
  • 29. Growth April 2013 ¡ 65+ Engineers ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ Page views per day 7 Data Infrastructure + Science 7 Search and Discovery 9 Business and Platform 6 Spam, Abuse, Security 9 Web 9 Mobile 2 growth 10 Infrastructure 6 Ops 65+ Non-Engineers Scaling Pinterest Monday, November 11, 13 April 2012 April 2013
  • 32. Arch Overview ELB Puppet StatsD Routing & Filtering (Varnish) Task Queue (Redis) Web App (Python) API App (Python / JS / HTML) Monit Sensu Task Processing (Python/Pyres) All connection pairings managed by ZooKeeper MySQL Service (Java/Finagle) Images (S3 + CDN) Scaling Pinterest Monday, November 11, 13 Memcache Mux (Nutcracker) Sharded MySQL Memcache Follower Service (Python/Thrift) Feed Service (Python/Thrift) Redis Search Service (Python/Thrift) HBase Spam Service (Python/Thrift)
  • 33. Data Pipeline Web App (Python) API App (Python) Task Processing (Python/Pyres) Kafka S3 Copier Tripwire (Spam) S3 Qubole Pinball Scaling Pinterest Monday, November 11, 13 Redshift
  • 34. Web App NGinX Website Rendering (x8) (Python / JS / HTML) API Scaling Pinterest Monday, November 11, 13
  • 36. Choosing Your Tech Questions to ask • Does it meet your needs? • How mature is the product? • Is it commonly used? Can you hire people who have used it? • Is the community active? • How robust is it to failure? • How well does it scale? Will you be the biggest user? • Does it have a good debugging tools? Profiler? Backup software? • Is the cost justified? Scaling Pinterest Monday, November 11, 13
  • 37. Hosting Why Amazon Web Services (AWS)? • Variety of servers running Linux • Very good peripherals: load balancing, DNS, map reduce, basic security, and more • Good reliability • Very active dev community • Not cheap, but... Scaling Pinterest Monday, November 11, 13
  • 38. Hosting Why Amazon Web Services (AWS)? • Variety of servers running Linux • Very good peripherals: load balancing, DNS, map reduce, basic security, and more • Good reliability • Very active dev community • Not cheap, but... • New instances ready in seconds Scaling Pinterest Monday, November 11, 13
  • 39. Hosting AWS Usage • Route 53 for DNS • ELB for 1st tier load balance • EC2 Ubuntu Linux • Varnish layer • All web, API, background appliances • All services • All databases and caches • S3 for images, logs Scaling Pinterest Monday, November 11, 13
  • 40. Code Why Python? • Extremely mature • Well known and well liked • Solid active community • Very good libraries specifically targeted to web development • Effective rapid prototyping • Open Source Scaling Pinterest Monday, November 11, 13
  • 41. Code Why Python? • Extremely mature • Well known and well liked • Solid active community • Very good libraries specifically targeted to web development • Effective rapid prototyping • Open Source Some Java and Go... • Faster, lower variance response time Scaling Pinterest Monday, November 11, 13
  • 42. Code Python Usage • All web backend, API, and related business logic • Most services Scaling Pinterest Monday, November 11, 13
  • 43. Code Python Usage • All web backend, API, and related business logic • Most services Java and Go Usage • Varnish plugins • Search indexers • High frequency services (e.g., MySQL service) Scaling Pinterest Monday, November 11, 13
  • 44. Production Data Why MySQL and Memcache? • Extremely mature • Well known and well liked • (MySQL) Rarely catastrophic loss of data • Response time to request rate increases linearly • Very good software support: XtraBackup, Innotop, Maatkit • Solid active community • Open Source Scaling Pinterest Monday, November 11, 13
  • 45. Production Data MySQL and Memcache Usage • Storage / Caching of core data • Users, boards, pins, comments, domains • Mappings (e.g., users to boards, user likes, repin info) • Legal compliance data Scaling Pinterest Monday, November 11, 13
  • 46. Production Data Why Redis? • Well known and well liked • Active community • Consistently good performance • Variety of convenient and efficient data structures • 3 Flavors of Persistence: Now, Snapshot, Never • Open Source Scaling Pinterest Monday, November 11, 13
  • 47. Production Data Redis Usage • Follower data • Configurations • Public feed pin IDs • Caching of various core mappings (e.g., board to pins) Scaling Pinterest Monday, November 11, 13
  • 48. Production Data Why HBase? • Small, but growing loyal community • Difficult to hire for, but... • Non-volatile, O(1), extremely fast and efficient storage • Strong Hadoop integration • Consistently good performance • Used by Facebook (bigger than us) • Seems to work well • Open Source Scaling Pinterest Monday, November 11, 13
  • 49. Production Data HBase Usage • User feeds (pin IDs are pushed to feeds) • Rich pin details • Spam features • User relationships to pins Scaling Pinterest Monday, November 11, 13
  • 50. Production Data What happened to Cassandra, Mongo, ES, and Membase? • Does it meet your needs? • How mature is the product? • Is it commonly used? Can you hire people who have used it? • Is the community active? Can you get help? • How robust is it to failure? • How well does it scale? Will you be the biggest user? • Does it have a good debugging tools? Profiler? Backup software? • Is the cost justified? Scaling Pinterest Monday, November 11, 13
  • 51. A 2nd chance... Scaling Pinterest Monday, November 11, 13
  • 52. A 2nd Chance Stuff we could have done better • Logging on day 1 (StatsD, Kafka, Map Reduce) • Log every request, event, signup • Basic analytics • Recovery from data corruption or failure • Alerting on day 1 Scaling Pinterest Monday, November 11, 13
  • 53. A 2nd Chance Stuff we could have done better • Shard our MySQL storage much earlier • Once you start relying on read slaves, start the timebomb countdown • We also fell into the NoSQL trap (Membase, Cassandra, Mongo, etc) • Pyres for background tasks day 1 • Hire technical operations eng earlier • Chef / Puppet earlier • Unit testing earlier (Jenkins for builds) Scaling Pinterest Monday, November 11, 13
  • 54. A 2nd Chance Stuff we could have done better • A/B testing earlier • Decider on top of Zookeeper WATCH • Progressive roll out • Kill switches Scaling Pinterest Monday, November 11, 13
  • 55. What’s next? Looking Forward • Continually improve Pinner experience • Help Pinners discover more of the things they love • Better uptime and lower latency • Faster development times • Reduce spam and abuse • Continually improve collaboration and build bigger, better, faster products • 180 Pinployees and beyond Scaling Pinterest Monday, November 11, 13
  • 57. marty@pinterest.com pinterest.com/martaaay Monday, November 11, 13 yashh@pinterest.com pinterest.com/yashh
  • 58. marty@pinterest.com pinterest.com/martaaay Monday, November 11, 13 yashh@pinterest.com pinterest.com/yashh
  • 59. My 2nd Chance If I could do it all over again... • Stronger ACID transactional guarantees across multiple systems • Currently have: sometimes A, best effort C, I, D, no silent failure • Want: sometimes A, eventual C, I, D, no silent completion Scaling Pinterest Monday, November 11, 13
  • 60. My 2nd Chance Transactional tasks • All tasks become a dependency tree of repeatable synchronous or asynchronous actions • All actions must be repeatable • Otherwise, must add repeatability • All tasks get a unique transaction number • Counters are tricky Scaling Pinterest Monday, November 11, 13
  • 61. My 2nd Chance Transactional tasks • All tasks become a dependency tree of repeatable synchronous or asynchronous actions • Sync actions are executed in order • Async actions are executed in any order • Repeat until successful or too many failures • Too many failures -> put in per task failure queue • Gives eventual C, I, D • No silent completion and A require extra effort Scaling Pinterest Monday, November 11, 13
  • 62. My 2nd Chance Transactional tasks example • Pin create sync • Write empty pin object • Write pin ID to board, likes, user’s pins, clear caches • Write pin object • Pin not shown until pin object created -> Atomicity! Scaling Pinterest Monday, November 11, 13
  • 63. My 2nd Chance Transactional tasks example • Pin create async • Write pin to required user feeds and public feeds • Feeds are sorted sets. Reinsertion is okay. • Send emails, Facebook Likes, Twitter Tweets • Before send, check / record in temporary storage -> Gives (temporary) repeatability Scaling Pinterest Monday, November 11, 13
  • 64. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/scalingpinterest