SlideShare ist ein Scribd-Unternehmen logo
1 von 43
SCALING OUR SAAS BACKEND
WITH POSTGRESQL
OLIVER SEEMANN, BIDMANAGEMENT GMBH
BWB MEETUP, 2013-10-28
THIS TALK IS ABOUT …
THIS TALK IS ABOUT …

Gigabytes

Terabytes
PRODUCTIVITY TOOLS FOR
ONLINE MARKETERS

Automatic Bid Management for
Auctioned Ads

“Organic” Search
SIGNIFICANT AMOUNTS OF DATA

10.000 Campaigns
5 Mio Keywords
4 Mio Ads
per AdWords account
SIGNIFICANT AMOUNTS OF DATA

Full History for all objects
over full lifetime
SLOW AND FAST DATA

“Slow” / OLAP data for
batch-processing jobs
“Fast” / OLTP data for
human interaction
INITIALLY SEPARATE

Slow
Data

Fast
Data
A LOT OF OVERLAP

Slow
Data

Fast
Data
THEN ONLY ONE

Slow
Data

Fast
Data
CURRENTLY

7 machines running PostgreSQL
3 Terabytes Data
Thousands of Databases
Largest Table: 120GB
HOW IT BEGAN

Experiment
DESIGN BY THE BOOK
Scenario
PK,FK1
PK,FK1
PK

Customer
PK

customer_id

Account

Campaign

Adgroup

PK

user_id

FK1

customer_id

account_id

PK

campaign_id

PK

adgroup_id

FK1

User

PK

customer_id

FK1

account_id

FK1

campaign_id

UserAccountAccess
PK,FK1
PK,FK2

account_id
user_id

History
PK
PK,FK1
PK,FK1,FK2

day
keyword_id
adgroup_id

keyword_id
adgroup_id
factor

Keywords
PK,FK1
PK

adgroup_id
keyword_id
MORE CUSTOMERS – MORE DATA
PARTITIONING
All Accounts
Account 1 – Rec 1
Account 2 – Rec 1
Account 1 – Rec 2
Account 3 – Rec 1

Account 2 – Rec 2
Account 2 – Rec 3
Account 1 – Rec 3

Account 3 – Rec 2
PARTITIONING
Account 1

Account 2

Account 3

Account 1 – Rec 1

Account 2 – Rec 1

Account 3 – Rec 1

Account 1 – Rec 2

Account 2 – Rec 2

Account 3 – Rec 2

Account 1 – Rec 3

Account 2 – Rec 3

Account 3 – Rec 3
PARTITION WITH INHERITANCE

SELECT

Child

Parent

INSERT

Child

CHECK CONSTRAINTS

Child
ISOLATE ACCOUNTS

One DB

Many DBs
PARTITIONING VIA DATABASES

Excellent horizontal scaling
Easy cloning
pg_dump/pg_restore
Some Overhead
No direct references
WHY NOT SCHEMAS?

More lightweight
Full References
No easy cloning
No schemas inside schemas
SETUP

main

machine-1

machine-0
machine-2
DB HARDWARE

Data > RAM
⇒ High I/O
EC2?
MIGRATION TO EC2

Must migrate all/most machines
No PostgreSQL in RDS
DB Instances run 24/7 ⇒ costly
EBS Performance limited
EBS I/O LIMITED
MB/s
900
800
700
600
500
400
300
200
100
0

Seq. Write
Seq. Read

AWS Instance AWS EBS (Raid-0)
Storage SSD (Raid0)

Real 15k SAS2
(Raid-10)
DEDICATED MACHINES

Moderate CPU / RAM
Fast Disks
Battery-backed caching controller
ALTERNATIVE HW

Use bigger (and slower) SATA drives
Evaluate EC2+EBS in production
SSDs
HARDWARE FAILS

Replication

Master

Slave

Availability
Query Load Balancing
REPLICATION
Account DBs

Main DB
master-1

master

slave-1

master-2

slave-2

slave
BACKUPS

pg_dump
compressed

Backup Server
REPLICATION
Account DBs

Main DB
master-1

master

slave-1

master-2

slave-2

slave
REPLICATION
Account DBs

Main DB
master-1

master

slave-1

master-2

slave-2

slave
REPLICATION
Account DBs

Main DB
master-1

master

master-3

master-2

master-4

slave
DISASTER RESTORE

concurrent
pg_restore

Backup Server
PERFORMANCE PROBLEMS
Too many concurrent full table scans
From 300MB/s to 30MB/s
MORE CONCURRENT
QUERIES

LONGER QUERY RUNTIME
DIFFERENT APPS

Web App
Server

Compute
Cluster

Many fast
queries

Few very
slow queries
DIFFERENT APPS
Semaphore

Web App
Server

Many fast queries

Compute
Cluster

Few very slow queries

Simple counting semaphore using Advisory Locks
Implemented in the application
BULK INSERTS

INSERT
20k – 80k
per sec

50M
BULK INSERT BEST PRACTICE

COPY instead of INSERT
Drop indexes + recreate
Truncate
COPY into a new table, swap + drop
SIGNUP PROBLEMS

Adspert
Service

Signup
CREATE
DATABASE

Up to 5-10 min
PRE-CREATE DATABASES

Create DBs ahead of time
New signups rename DBs
Periodically create new
Fall back to direct create
CONCLUDING ..

Partitioning into Databases
Physical Hardware
Check out advisory locks
THANKS FOR LISTENING

QUESTIONS?

Weitere ähnliche Inhalte

Kürzlich hochgeladen

UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
100+ ChatGPT Prompts for SEO Optimization
100+ ChatGPT Prompts for SEO Optimization100+ ChatGPT Prompts for SEO Optimization
100+ ChatGPT Prompts for SEO Optimizationarrow10202532yuvraj
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"
UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"
UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"DianaGray10
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 

Kürzlich hochgeladen (20)

UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
100+ ChatGPT Prompts for SEO Optimization
100+ ChatGPT Prompts for SEO Optimization100+ ChatGPT Prompts for SEO Optimization
100+ ChatGPT Prompts for SEO Optimization
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"
UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"
UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 

Empfohlen

AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 

Empfohlen (20)

AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 

Scaling our SaaS backend with PostgreSQL

Hinweis der Redaktion

  1. Hi, I’m Oliver, I’m a software developer, currently heading the development team at Bidmanagement GmbH in Berlin.
  2. I’m going to talk aboutpostgresqlNot so much about the dbms itself, but more about how we’re using it as main datastore in our system.
  3. About how in our company we're running a large Postgresql installationHow we‘ve grown our setup
  4. ----- Meeting Notes (27/10/13 11:10) -----very popularbillions of dollarsvery important online marketing channel
  5. Google provides a very extensive API
  6. ----- Meeting Notes (23/10/13 22:22) -----The different kinds of data we store can be largely separated into two groups.
  7. .. And we decided to go with postgresql, because:Our Go-To tool for storing data for many yearsProblems from time to time, but..We never looked back
  8. But it began much smaller …
  9. Straightforward approachNobody thought of scaling
  10. Pilots successful, we started to acquire customersSoon >10mio rows in some tablesQuery performance lagged (many FTS) Did not want to scale horizontally, because we aspired much bigger growth(Also: expensive)----- Meeting Notes (24/10/13 20:45) -----vertically
  11. PostgreSQL supports partitioning via inheritance[insert scheme]Use CHECK constraints to tell Query Planner where to lookCannot insert into parent table, must insert into child tableLot of effort goes to application logicTried it on one table, weren’t it conviced
  12. One main db with non-account specific dataCurrently ~ 1-2 GBSeveral machines dedicated to account-databases50-1000 DBs per machinePostgreSQL 9.0 and 9.3 on each machineAllows us to migrate one db after another
  13. Partitioning scheme allows easy horizontal scaling More machines. But which?Dataset does not fit in RAM High I/O requirementsAWS EC2?Must migrate all/most machines due to latencyDB Instances run 24/7  costlyEBS Performance limited (GBit Ethernet)[ec2 / ebs performance numbers vs. physical]----- Meeting Notes (24/10/13 20:45) -----Add: not many core
  14. Not that much elasticity requiredAs B2B our growth is more predictableBatch processing of expensive backend jobs1 year EC2 instance ≅ Buying one physical serverUsing mid-sized machinesGood price/value ratio
  15. SATA: 600GB vs 3 TBEC2: performance, latency unclear. Evaluate to make informed decisionSSDs: expensive. Reliable? Raid?
  16. But when things go awry and data gets deleted …
  17. Big cheap HDDs
  18. But when things go awry and data gets deleted …
  19. But when things go awry and data gets deleted …
  20. MainDB still replicatedTo enable quick failoverHere we can’t afford extended downtime
  21. Capacity doubled, cost reduced 40%The more servers, the faster the restoreGbit Ethernet on backup server is limiting factor
  22. From sequential reads to random readsFeedback loop:
  23. Webapp-queries with humans waiting are quite fastProblematic queries done by the analysis jobsFrequent full table scansQueries with huge resultsNeed way to synchronize queries, control concurrencyCould use a connection poolerOr an external synchronization mechanisme.g. Zookeeper
  24. Webapp-queries with humans waiting are quite fastProblematic queries done by the analysis jobsFrequent full table scansQueries with huge resultsNeed way to synchronize queries, control concurrencyCould use a connection poolerOr an external synchronization mechanisme.g. Zookeeper
  25. We rewrite the history every day (for various reasons)Conversions arrive up to 30 days laterCampaigns are added to optimizationFor most accounts <1M recordsFor some 10-100MWe achieve up to 80k inserts/secNetwork is bottleneck [check this]
  26. We use COPY for all bulk inserts, even small bulksDrop/recreate with simple plpgsql functionsFor complete table rewritesTRUNCATE is not transaction safe
  27. We added a self-service signup2-minute process to add AdWords account to the systemOAuth User Info  Optimization BootstrapBiggest problem:CREATE DATABASE can take several minutesDepends on current amount of write activity
  28. We know always keep 10-20 spare databases in stockWe control target host for new databases this wayTake care not to have race conditions when applying schema changes