SlideShare a Scribd company logo
1 of 46
The Challenges of
Distributing Postgres:
A Citus Story
Ozgun Erdogan
DataEngConf NYC | October 2017
Developers Love Postgres
PostgreSQL
MySQL
MongoDB
SQL Server +
Oracle
RDBMS: PostgreSQL, MySQL, Microsoft SQL Server, Oracle
Ozgun Erdogan | DataEngConf NYC 2017
I love Postgres, too
3 Ozgun Erdogan | DataEngConf NYC 2017
Ozgun Erdogan
CTO of Citus Data
Distributed Systems
Distributed Databases
Formerly of Amazon
Love drinking margaritas
4
Our mission at Citus Data
5 Ozgun Erdogan | DataEngConf NYC 2017
Make it so SaaS businesses
never have to worry about
scaling their database again
What is the Citus database?
1.Scales out PostgreSQL
2.Extension to PostgreSQL
3.Available in 3 Ways
Ozgun Erdogan | DataEngConf NYC 2017
• Using sharding & replication
• Query engine parallelizes SQL queries across many nodes
• Using PostgreSQL extension APIs
Citus, Packaged Three Ways
Ozgun Erdogan | DataEngConf NYC 2017
Open
Source
Enterprise
Software
Fully-Managed
Database as a Service
github.com/citusdata/citus
Simplified	Citus	Architecture
3 Challenges Distributing Postgres
1. PostgreSQL and High Availability
2. To build new distributed database—or to fork?
3. Distributed transactions
Ozgun Erdogan | DataEngConf NYC 2017
PostgreSQL &
High Availability (HA)
Designing for a Cloud-native world
1
Why is High Availability hard?
PostgreSQL replication uses one primary &
multiple secondary nodes. Two challenges:
1. Most Postgres clients aren’t smart. When the
primary fails, they retry the same IP.
2. Postgres replicates entire state. This makes it
resource intensive to reconstruct new nodes from a
primary.
Ozgun Erdogan | DataEngConf NYC 2017
Database Failures Should Be Transparent
Ozgun Erdogan | DataEngConf NYC 2017
Database Failures Shouldn’t Be a Big Deal
1. PostgreSQL streaming replication to replicate from
primary to secondary. Back up to S3.
2. Volume level replication to replicate to secondary’s
volume. Back up to S3.
3. Incremental backups to S3. Reconstruct secondary
nodes from S3.
Ozgun Erdogan | DataEngConf NYC 2017
3 Methods for HA & Backups in Postgres
Postgres - Streaming Replication (1)
Write-ahead	logs		
(streaming	repl.)
Table foo
Primary	–
PostgreSQL	
streaming	repl.	
Table bar
WAL logs
Table foo
Table bar
WAL logs
Secondary	–
PostgreSQL	
streaming	repl.	
Monitoring	Agents	-
streaming	repl.	
setup	&	auto	failover
S3	/	Blob	Storage
(Encrypted)	
Backup
Process
Ozgun Erdogan | DataEngConf NYC 2017
Postgres – AWS RDS & Azure (2)
Postgres
Primary
Monitoring	Agents
(Auto	node	failover)
Persistent	Volume
Postgres
Standby	
S3	/	Blob	Storage
(Encrypted)	
Table foo
Table bar
WAL logs
Table foo
Table bar
WAL logs
Backup	process
Backup
Process
Ozgun Erdogan | DataEngConf NYC 2017
Persistent	Volume
Postgres – Reconstruct from WAL (3)
Postgres
Primary
Monitoring	Agents
(Auto	node	failover)
Persistent	Volume
Postgres
Secondary
Backup
Process
S3	/	Blob	Storage
(Encrypted)	
Table foo
Table bar
WAL logs
Persistent	Volume
Table foo
Table bar
WAL logs
Backup	process
Ozgun Erdogan | DataEngConf NYC 2017
WHO DOES THIS? PRIMARY BENEFITS
Streaming Replication
(local / ephemeral disk)
On-prem
Manual EC2
Simple to set up
Direct I/O: High I/O & large storage
Disk Mirroring
RDS
Azure Preview
Works for MySQL and PostgreSQL
Data durability in cloud environments
Reconstruct from WAL
Heroku
Citus Data
Enables Fork and PITR
Node reconstruction in background
(Data durability in cloud environments)
How do these approaches compare?
17 Ozgun Erdogan | DataEngConf NYC 2017
Summary
• In PostgreSQL, a database node’s state gets
replicated in its entirety. The replication can be set up
in three ways.
• Reconstructing a secondary node from S3 makes
bringing up or shooting down nodes easy.
• When you shard your database, the state you need to
replicate per node becomes smaller.
Ozgun Erdogan | DataEngConf NYC 2017
PostgreSQL has a
huge ecosystem.
How do you keep up with it?
2
3 ways to build a distributed database
1. Build a distributed database from scratch
2. Middleware sharding (mimic the parser)
3. Fork your favorite database (like PostgreSQL)
Ozgun Erdogan | DataEngConf NYC 2017
Example Transaction Block
Ozgun Erdogan | DataEngConf NYC 2017
Postgres Features, Tools & Frameworks
• PostgreSQL manual (US Letter)
• Clients for diff programming
languages
• ORMs, libraries, GUIs
• Tools (dump, restore, analyze)
• New features
Ozgun Erdogan | DataEngConf NYC 2017
At First, Forked PostgreSQL with Style
Ozgun Erdogan | DataEngConf NYC 2017
Two Stage Query Optimization
1. Plan to minimize network I/O
2. Nodes talk to each other using SQL over libpq
3. Learned to cooperate with planner / executor bit by bit
(Volcano style executor)
Ozgun Erdogan | DataEngConf NYC 2017
Citus Architecture (Simplified)
25
SELECT avg(revenue)
FROM sales
Coordinator
SELECT sum(revenue), count(revenue)
FROM table_1001
SELECT sum … FROM table_1003
Worker node 1
Table metadata
Table_1001
Table_1003
SELECT sum … FROM table_1002
SELECT sum … FROM table_1004
Worker node 2
Table_1002
Table_1004
Worker node N
.
.
.
.
.
.
Each node PostgreSQL with Citus installed
1 shard = 1 PostgreSQL table
Ozgun Erdogan | DataEngConf NYC 2017
Unfork Citus using Extension APIs
CREATE EXTENSION citus;
• System catalogs – Distributed metadata
• Planner hook – Insert, Update, Delete, Select
• Executor hook – Insert, Update, Delete, Select
• Utility hook – Alter Table, Create Index, Vacuum, etc.
• Transaction & resources handling – file descriptors, etc.
• Background worker process – Maintenance processes
(distributed deadlock detection, task tracker, etc.)
• Logical decoding – Online data migrations
Ozgun Erdogan | DataEngConf NYC 2017
PostgreSQL has transactions.
How to handle distributed transactions
3
BEGIN
INSERT
UPDATE
SELECT
COMMIT
ROLLBACK
Consistency in Distributed Databases
1. 2PC: All participating nodes need to be up
2. Paxos: Achieves consensus with quorum
3. Raft: More understandable alternative to
Paxos
Ozgun Erdogan | DataEngConf NYC 2017
Concurrency in Distributed Databases
Ozgun Erdogan | DataEngConf NYC 2017
LocksLocks
What	is	a	Lock?
• Protects	against	concurrent	modifications.
• Locks	are	released	at	the	end	of	a	transaction.
Deadlocks
Transactions	Block	on	1st Conflicting	LockWhat is a lock?
Protects against concurrent modifications
Locks released at end of transaction
BEGIN;
UPDATE data SET y = 2 WHERE x = 1;
<obtained lock on rows with x = 1>
COMMIT;
<all locks released>
BEGIN;
UPDATE data SET y = 5 WHERE x = 1;
<waiting for lock on rows with x = 1>
<obtained lock on rows with x = 1>
COMMIT;
Transactions	and	Concurrency
• Transactions	that	don’t	modify	the	same	row	
can	run	concurrently.
Transactions block on 1st lock that conflicts
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 1;
COMMIT;
<all locks released>
BEGIN;
UPDATE data SET y = y + 1 WHERE x = 2;
UPDATE data SET y = y + 1 WHERE x = 1;
<waiting for lock on rows with x = 1>
<obtained lock on rows with x = 1>
COMMIT;
Transactions	and	Concurrency
• Transactions	that	don’t	modify	the	same	row	
can	run	concurrently.
Transactions block on 1st lock that conflicts
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 1;
COMMIT;
<all locks released>
BEGIN;
UPDATE data SET y = y + 1 WHERE x = 2;
UPDATE data SET y = y + 1 WHERE x = 1;
<waiting for lock on rows with x = 1>
<obtained lock on rows with x = 1>
COMMIT;
(Distributed) deadlock!
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 1;
UPDATE data SET y = y + 1 WHERE x = 2;
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 2;
UPDATE data SET y = y + 1 WHERE x = 1;
But what if they start blocking each other?
Transactions	and	Concurrency
• Transactions	that	don’t	modify	the	same	row	
can	run	concurrently.
Transactions block on 1st lock that conflicts
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 1;
COMMIT;
<all locks released>
BEGIN;
UPDATE data SET y = y + 1 WHERE x = 2;
UPDATE data SET y = y + 1 WHERE x = 1;
<waiting for lock on rows with x = 1>
<obtained lock on rows with x = 1>
COMMIT;
(Distributed) deadlock!
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 1;
UPDATE data SET y = y + 1 WHERE x = 2;
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 2;
UPDATE data SET y = y + 1 WHERE x = 1;
But what if they start blocking each other?Deadlock detection in PostgreSQL
Deadlock detection builds a graph of processes that
are waiting for each other.
Transactions	and	Concurrency
• Transactions	that	don’t	modify	the	same	row	
can	run	concurrently.
Transactions block on 1st lock that conflicts
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 1;
COMMIT;
<all locks released>
BEGIN;
UPDATE data SET y = y + 1 WHERE x = 2;
UPDATE data SET y = y + 1 WHERE x = 1;
<waiting for lock on rows with x = 1>
<obtained lock on rows with x = 1>
COMMIT;
(Distributed) deadlock!
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 1;
UPDATE data SET y = y + 1 WHERE x = 2;
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 2;
UPDATE data SET y = y + 1 WHERE x = 1;
But what if they start blocking each other?Deadlock detection in PostgreSQL
Deadlock detection builds a graph of processes that
are waiting for each other.
Deadlock detection in PostgreSQL
Transactions are cancelled until the cycle is gone
Transactions	and	Concurrency
• Transactions	that	don’t	modify	the	same	row	
can	run	concurrently.
Transactions block on 1st lock that conflicts
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 1;
COMMIT;
<all locks released>
BEGIN;
UPDATE data SET y = y + 1 WHERE x = 2;
UPDATE data SET y = y + 1 WHERE x = 1;
<waiting for lock on rows with x = 1>
<obtained lock on rows with x = 1>
COMMIT;
(Distributed) deadlock!
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 1;
UPDATE data SET y = y + 1 WHERE x = 2;
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 2;
UPDATE data SET y = y + 1 WHERE x = 1;
But what if they start blocking each other?Deadlock detection in PostgreSQL
Deadlock detection builds a graph of processes that
are waiting for each other.
Deadlock detection in PostgreSQL
Transactions are cancelled until the cycle is gone
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlocks in Citus
Citus delegates transactions to nodes
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlocks in Citus
Citus delegates transactions to nodes
Transactions	and	Concurrency
• Transactions	that	don’t	modify	the	same	row	
can	run	concurrently.
Transactions block on 1st lock that conflicts
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 1;
COMMIT;
<all locks released>
BEGIN;
UPDATE data SET y = y + 1 WHERE x = 2;
UPDATE data SET y = y + 1 WHERE x = 1;
<waiting for lock on rows with x = 1>
<obtained lock on rows with x = 1>
COMMIT;
(Distributed) deadlock!
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 1;
UPDATE data SET y = y + 1 WHERE x = 2;
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 2;
UPDATE data SET y = y + 1 WHERE x = 1;
But what if they start blocking each other?Deadlock detection in PostgreSQL
Deadlock detection builds a graph of processes that
are waiting for each other.
Deadlock detection in PostgreSQL
Transactions are cancelled until the cycle is gone
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlocks in Citus
Citus delegates transactions to nodes
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlocks in Citus
Citus delegates transactions to nodes
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlocks in Citus
PostgreSQL’s deadlock detector still works
Transactions	and	Concurrency
• Transactions	that	don’t	modify	the	same	row	
can	run	concurrently.
Transactions block on 1st lock that conflicts
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 1;
COMMIT;
<all locks released>
BEGIN;
UPDATE data SET y = y + 1 WHERE x = 2;
UPDATE data SET y = y + 1 WHERE x = 1;
<waiting for lock on rows with x = 1>
<obtained lock on rows with x = 1>
COMMIT;
(Distributed) deadlock!
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 1;
UPDATE data SET y = y + 1 WHERE x = 2;
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 2;
UPDATE data SET y = y + 1 WHERE x = 1;
But what if they start blocking each other?Deadlock detection in PostgreSQL
Deadlock detection builds a graph of processes that
are waiting for each other.
Deadlock detection in PostgreSQL
Transactions are cancelled until the cycle is gone
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlocks in Citus
Citus delegates transactions to nodes
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlocks in Citus
Citus delegates transactions to nodes
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlocks in Citus
PostgreSQL’s deadlock detector still works
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlocks in Citus
When deadlocks span across node, PostgreSQL cannot help us
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlocks in Citus
When deadlocks span across node, PostgreSQL cannot help us
Transactions	and	Concurrency
• Transactions	that	don’t	modify	the	same	row	
can	run	concurrently.
Transactions block on 1st lock that conflicts
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 1;
COMMIT;
<all locks released>
BEGIN;
UPDATE data SET y = y + 1 WHERE x = 2;
UPDATE data SET y = y + 1 WHERE x = 1;
<waiting for lock on rows with x = 1>
<obtained lock on rows with x = 1>
COMMIT;
(Distributed) deadlock!
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 1;
UPDATE data SET y = y + 1 WHERE x = 2;
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 2;
UPDATE data SET y = y + 1 WHERE x = 1;
But what if they start blocking each other?Deadlock detection in PostgreSQL
Deadlock detection builds a graph of processes that
are waiting for each other.
Deadlock detection in PostgreSQL
Transactions are cancelled until the cycle is gone
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlocks in Citus
Citus delegates transactions to nodes
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlocks in Citus
Citus delegates transactions to nodes
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlocks in Citus
PostgreSQL’s deadlock detector still works
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlocks in Citus
When deadlocks span across node, PostgreSQL cannot help us
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlocks in Citus
When deadlocks span across node, PostgreSQL cannot help us
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlock detection in Citus 7
Citus 7 adds distributed deadlock detection
Transactions	and	Concurrency
• Transactions	that	don’t	modify	the	same	row	
can	run	concurrently.
Transactions block on 1st lock that conflicts
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 1;
COMMIT;
<all locks released>
BEGIN;
UPDATE data SET y = y + 1 WHERE x = 2;
UPDATE data SET y = y + 1 WHERE x = 1;
<waiting for lock on rows with x = 1>
<obtained lock on rows with x = 1>
COMMIT;
(Distributed) deadlock!
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 1;
UPDATE data SET y = y + 1 WHERE x = 2;
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 2;
UPDATE data SET y = y + 1 WHERE x = 1;
But what if they start blocking each other?Deadlock detection in PostgreSQL
Deadlock detection builds a graph of processes that
are waiting for each other.
Deadlock detection in PostgreSQL
Transactions are cancelled until the cycle is gone
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlocks in Citus
Citus delegates transactions to nodes
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlocks in Citus
Citus delegates transactions to nodes
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlocks in Citus
PostgreSQL’s deadlock detector still works
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlocks in Citus
When deadlocks span across node, PostgreSQL cannot help us
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlocks in Citus
When deadlocks span across node, PostgreSQL cannot help us
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlock detection in Citus 7
Citus 7 adds distributed deadlock detection
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlock detection in Citus 7
Citus 7 adds distributed deadlock detection.
Distributed transactions are… a
complex topic
• Most articles on distributed transactions focus on data
consistency.
• Data consistency is only one side of the coin. If you’re
using a relational database, your application benefits
from another key feature: deadlock detection.
• https://www.citusdata.com/blog/2017/08/31/databases
-and-distributed-deadlocks-a-faq
Ozgun Erdogan | DataEngConf NYC 2017
So now what? We talked about 3
challenges distributing Postgres…
1. PostgreSQL, Replication, High Availability
2. Tradeoffs in different approaches to building a
distributed database—and how we chose
PostgreSQL’s extension APIs
3. Distributed deadlock detection & distributed
transactions
Ozgun Erdogan | DataEngConf NYC 2017
45
“SQL is hard, not impossible, to scale”
© 2017 Citus Data. All right reserved.
ozgun@citusdata.com
Questions?
@citusdata
Ozgun Erdogan
www.citusdata.com

More Related Content

What's hot

【Maclean liu技术分享】拨开oracle cbo优化器迷雾,探究histogram直方图之秘 0321
【Maclean liu技术分享】拨开oracle cbo优化器迷雾,探究histogram直方图之秘 0321【Maclean liu技术分享】拨开oracle cbo优化器迷雾,探究histogram直方图之秘 0321
【Maclean liu技术分享】拨开oracle cbo优化器迷雾,探究histogram直方图之秘 0321maclean liu
 
Optimizer Cost Model MySQL 5.7
Optimizer Cost Model MySQL 5.7Optimizer Cost Model MySQL 5.7
Optimizer Cost Model MySQL 5.7I Goo Lee
 
了解Oracle rac brain split resolution
了解Oracle rac brain split resolution了解Oracle rac brain split resolution
了解Oracle rac brain split resolutionmaclean liu
 
How to Avoid Pitfalls in Schema Upgrade with Galera
How to Avoid Pitfalls in Schema Upgrade with GaleraHow to Avoid Pitfalls in Schema Upgrade with Galera
How to Avoid Pitfalls in Schema Upgrade with GaleraSveta Smirnova
 
MySQL as a Document Store
MySQL as a Document StoreMySQL as a Document Store
MySQL as a Document StoreDave Stokes
 
What's new in PostgreSQL 11 ?
What's new in PostgreSQL 11 ?What's new in PostgreSQL 11 ?
What's new in PostgreSQL 11 ?José Lin
 
Direct SGA access without SQL
Direct SGA access without SQLDirect SGA access without SQL
Direct SGA access without SQLKyle Hailey
 
MySQL Replication Update -- Zendcon 2016
MySQL Replication Update -- Zendcon 2016MySQL Replication Update -- Zendcon 2016
MySQL Replication Update -- Zendcon 2016Dave Stokes
 
MySQL 8 -- A new beginning : Sunshine PHP/PHP UK (updated)
MySQL 8 -- A new beginning : Sunshine PHP/PHP UK (updated)MySQL 8 -- A new beginning : Sunshine PHP/PHP UK (updated)
MySQL 8 -- A new beginning : Sunshine PHP/PHP UK (updated)Dave Stokes
 
MySQL Replication Basics -Ohio Linux Fest 2016
MySQL Replication Basics -Ohio Linux Fest 2016MySQL Replication Basics -Ohio Linux Fest 2016
MySQL Replication Basics -Ohio Linux Fest 2016Dave Stokes
 
PostgreSQL 9.6 새 기능 소개
PostgreSQL 9.6 새 기능 소개PostgreSQL 9.6 새 기능 소개
PostgreSQL 9.6 새 기능 소개PgDay.Seoul
 
DBA Commands and Concepts That Every Developer Should Know
DBA Commands and Concepts That Every Developer Should KnowDBA Commands and Concepts That Every Developer Should Know
DBA Commands and Concepts That Every Developer Should KnowAlex Zaballa
 
Database & Technology 1 _ Tom Kyte _ SQL Techniques.pdf
Database & Technology 1 _ Tom Kyte _ SQL Techniques.pdfDatabase & Technology 1 _ Tom Kyte _ SQL Techniques.pdf
Database & Technology 1 _ Tom Kyte _ SQL Techniques.pdfInSync2011
 
11 Things About11g
11 Things About11g11 Things About11g
11 Things About11gfcamachob
 
Introduction to MySQL Query Tuning for Dev[Op]s
Introduction to MySQL Query Tuning for Dev[Op]sIntroduction to MySQL Query Tuning for Dev[Op]s
Introduction to MySQL Query Tuning for Dev[Op]sSveta Smirnova
 
Oracle Database SQL Tuning Concept
Oracle Database SQL Tuning ConceptOracle Database SQL Tuning Concept
Oracle Database SQL Tuning ConceptChien Chung Shen
 
OpenWorld Sep14 12c for_developers
OpenWorld Sep14 12c for_developersOpenWorld Sep14 12c for_developers
OpenWorld Sep14 12c for_developersConnor McDonald
 
Database & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdf
Database & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdfDatabase & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdf
Database & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdfInSync2011
 

What's hot (20)

【Maclean liu技术分享】拨开oracle cbo优化器迷雾,探究histogram直方图之秘 0321
【Maclean liu技术分享】拨开oracle cbo优化器迷雾,探究histogram直方图之秘 0321【Maclean liu技术分享】拨开oracle cbo优化器迷雾,探究histogram直方图之秘 0321
【Maclean liu技术分享】拨开oracle cbo优化器迷雾,探究histogram直方图之秘 0321
 
Optimizer Cost Model MySQL 5.7
Optimizer Cost Model MySQL 5.7Optimizer Cost Model MySQL 5.7
Optimizer Cost Model MySQL 5.7
 
MySQL 5.5 Guide to InnoDB Status
MySQL 5.5 Guide to InnoDB StatusMySQL 5.5 Guide to InnoDB Status
MySQL 5.5 Guide to InnoDB Status
 
了解Oracle rac brain split resolution
了解Oracle rac brain split resolution了解Oracle rac brain split resolution
了解Oracle rac brain split resolution
 
How to Avoid Pitfalls in Schema Upgrade with Galera
How to Avoid Pitfalls in Schema Upgrade with GaleraHow to Avoid Pitfalls in Schema Upgrade with Galera
How to Avoid Pitfalls in Schema Upgrade with Galera
 
MySQL as a Document Store
MySQL as a Document StoreMySQL as a Document Store
MySQL as a Document Store
 
What's new in PostgreSQL 11 ?
What's new in PostgreSQL 11 ?What's new in PostgreSQL 11 ?
What's new in PostgreSQL 11 ?
 
Direct SGA access without SQL
Direct SGA access without SQLDirect SGA access without SQL
Direct SGA access without SQL
 
MySQL Replication Update -- Zendcon 2016
MySQL Replication Update -- Zendcon 2016MySQL Replication Update -- Zendcon 2016
MySQL Replication Update -- Zendcon 2016
 
MySQL 8 -- A new beginning : Sunshine PHP/PHP UK (updated)
MySQL 8 -- A new beginning : Sunshine PHP/PHP UK (updated)MySQL 8 -- A new beginning : Sunshine PHP/PHP UK (updated)
MySQL 8 -- A new beginning : Sunshine PHP/PHP UK (updated)
 
MySQL Replication Basics -Ohio Linux Fest 2016
MySQL Replication Basics -Ohio Linux Fest 2016MySQL Replication Basics -Ohio Linux Fest 2016
MySQL Replication Basics -Ohio Linux Fest 2016
 
PostgreSQL 9.6 새 기능 소개
PostgreSQL 9.6 새 기능 소개PostgreSQL 9.6 새 기능 소개
PostgreSQL 9.6 새 기능 소개
 
DBA Commands and Concepts That Every Developer Should Know
DBA Commands and Concepts That Every Developer Should KnowDBA Commands and Concepts That Every Developer Should Know
DBA Commands and Concepts That Every Developer Should Know
 
Database & Technology 1 _ Tom Kyte _ SQL Techniques.pdf
Database & Technology 1 _ Tom Kyte _ SQL Techniques.pdfDatabase & Technology 1 _ Tom Kyte _ SQL Techniques.pdf
Database & Technology 1 _ Tom Kyte _ SQL Techniques.pdf
 
11 Things About11g
11 Things About11g11 Things About11g
11 Things About11g
 
Introduction to MySQL Query Tuning for Dev[Op]s
Introduction to MySQL Query Tuning for Dev[Op]sIntroduction to MySQL Query Tuning for Dev[Op]s
Introduction to MySQL Query Tuning for Dev[Op]s
 
Oracle ORA Errors
Oracle ORA ErrorsOracle ORA Errors
Oracle ORA Errors
 
Oracle Database SQL Tuning Concept
Oracle Database SQL Tuning ConceptOracle Database SQL Tuning Concept
Oracle Database SQL Tuning Concept
 
OpenWorld Sep14 12c for_developers
OpenWorld Sep14 12c for_developersOpenWorld Sep14 12c for_developers
OpenWorld Sep14 12c for_developers
 
Database & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdf
Database & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdfDatabase & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdf
Database & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdf
 

Similar to The Challenges of Distributing Postgres: A Citus Story

MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)Jean-François Gagné
 
Taking Spark Streaming to the Next Level with Datasets and DataFrames
Taking Spark Streaming to the Next Level with Datasets and DataFramesTaking Spark Streaming to the Next Level with Datasets and DataFrames
Taking Spark Streaming to the Next Level with Datasets and DataFramesDatabricks
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
Using R on Netezza
Using R on NetezzaUsing R on Netezza
Using R on NetezzaAjay Ohri
 
String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?Jeremy Schneider
 
NewSQL - Deliverance from BASE and back to SQL and ACID
NewSQL - Deliverance from BASE and back to SQL and ACIDNewSQL - Deliverance from BASE and back to SQL and ACID
NewSQL - Deliverance from BASE and back to SQL and ACIDTony Rogerson
 
2017 AWS DB Day | Amazon Redshift 소개 및 실습
2017 AWS DB Day | Amazon Redshift  소개 및 실습2017 AWS DB Day | Amazon Redshift  소개 및 실습
2017 AWS DB Day | Amazon Redshift 소개 및 실습Amazon Web Services Korea
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevAltinity Ltd
 
Collaborate 2011– Leveraging and Enriching the Capabilities of Oracle Databas...
Collaborate 2011– Leveraging and Enriching the Capabilities of Oracle Databas...Collaborate 2011– Leveraging and Enriching the Capabilities of Oracle Databas...
Collaborate 2011– Leveraging and Enriching the Capabilities of Oracle Databas...djkucera
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Michael Renner
 
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftBest Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftSnapLogic
 
11thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp0111thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp01Karam Abuataya
 
Scaling PostgreSQL With GridSQL
Scaling PostgreSQL With GridSQLScaling PostgreSQL With GridSQL
Scaling PostgreSQL With GridSQLJim Mlodgenski
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Data Con LA
 
Tutorial On Database Management System
Tutorial On Database Management SystemTutorial On Database Management System
Tutorial On Database Management Systempsathishcs
 
Accelerating analytics on the Sensor and IoT Data.
Accelerating analytics on the Sensor and IoT Data. Accelerating analytics on the Sensor and IoT Data.
Accelerating analytics on the Sensor and IoT Data. Keshav Murthy
 
How Database Convergence Impacts the Coming Decades of Data Management
How Database Convergence Impacts the Coming Decades of Data ManagementHow Database Convergence Impacts the Coming Decades of Data Management
How Database Convergence Impacts the Coming Decades of Data ManagementSingleStore
 

Similar to The Challenges of Distributing Postgres: A Citus Story (20)

MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
 
Taking Spark Streaming to the Next Level with Datasets and DataFrames
Taking Spark Streaming to the Next Level with Datasets and DataFramesTaking Spark Streaming to the Next Level with Datasets and DataFrames
Taking Spark Streaming to the Next Level with Datasets and DataFrames
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Using R on Netezza
Using R on NetezzaUsing R on Netezza
Using R on Netezza
 
String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?
 
NewSQL - Deliverance from BASE and back to SQL and ACID
NewSQL - Deliverance from BASE and back to SQL and ACIDNewSQL - Deliverance from BASE and back to SQL and ACID
NewSQL - Deliverance from BASE and back to SQL and ACID
 
2017 AWS DB Day | Amazon Redshift 소개 및 실습
2017 AWS DB Day | Amazon Redshift  소개 및 실습2017 AWS DB Day | Amazon Redshift  소개 및 실습
2017 AWS DB Day | Amazon Redshift 소개 및 실습
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Apache Cassandra at Macys
Apache Cassandra at MacysApache Cassandra at Macys
Apache Cassandra at Macys
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
 
Collaborate 2011– Leveraging and Enriching the Capabilities of Oracle Databas...
Collaborate 2011– Leveraging and Enriching the Capabilities of Oracle Databas...Collaborate 2011– Leveraging and Enriching the Capabilities of Oracle Databas...
Collaborate 2011– Leveraging and Enriching the Capabilities of Oracle Databas...
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014
 
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftBest Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
 
11thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp0111thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp01
 
Scaling PostgreSQL With GridSQL
Scaling PostgreSQL With GridSQLScaling PostgreSQL With GridSQL
Scaling PostgreSQL With GridSQL
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
 
Tutorial On Database Management System
Tutorial On Database Management SystemTutorial On Database Management System
Tutorial On Database Management System
 
Accelerating analytics on the Sensor and IoT Data.
Accelerating analytics on the Sensor and IoT Data. Accelerating analytics on the Sensor and IoT Data.
Accelerating analytics on the Sensor and IoT Data.
 
How Database Convergence Impacts the Coming Decades of Data Management
How Database Convergence Impacts the Coming Decades of Data ManagementHow Database Convergence Impacts the Coming Decades of Data Management
How Database Convergence Impacts the Coming Decades of Data Management
 

Recently uploaded

Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The Evolution of Money: Digital Transformation and CBDCs in Central Banking
The Evolution of Money: Digital Transformation and CBDCs in Central BankingThe Evolution of Money: Digital Transformation and CBDCs in Central Banking
The Evolution of Money: Digital Transformation and CBDCs in Central BankingSelcen Ozturkcan
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 

Recently uploaded (20)

Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The Evolution of Money: Digital Transformation and CBDCs in Central Banking
The Evolution of Money: Digital Transformation and CBDCs in Central BankingThe Evolution of Money: Digital Transformation and CBDCs in Central Banking
The Evolution of Money: Digital Transformation and CBDCs in Central Banking
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 

The Challenges of Distributing Postgres: A Citus Story

  • 1. The Challenges of Distributing Postgres: A Citus Story Ozgun Erdogan DataEngConf NYC | October 2017
  • 2. Developers Love Postgres PostgreSQL MySQL MongoDB SQL Server + Oracle RDBMS: PostgreSQL, MySQL, Microsoft SQL Server, Oracle Ozgun Erdogan | DataEngConf NYC 2017
  • 3. I love Postgres, too 3 Ozgun Erdogan | DataEngConf NYC 2017 Ozgun Erdogan CTO of Citus Data Distributed Systems Distributed Databases Formerly of Amazon Love drinking margaritas
  • 4. 4
  • 5. Our mission at Citus Data 5 Ozgun Erdogan | DataEngConf NYC 2017 Make it so SaaS businesses never have to worry about scaling their database again
  • 6. What is the Citus database? 1.Scales out PostgreSQL 2.Extension to PostgreSQL 3.Available in 3 Ways Ozgun Erdogan | DataEngConf NYC 2017 • Using sharding & replication • Query engine parallelizes SQL queries across many nodes • Using PostgreSQL extension APIs
  • 7. Citus, Packaged Three Ways Ozgun Erdogan | DataEngConf NYC 2017 Open Source Enterprise Software Fully-Managed Database as a Service github.com/citusdata/citus
  • 9. 3 Challenges Distributing Postgres 1. PostgreSQL and High Availability 2. To build new distributed database—or to fork? 3. Distributed transactions Ozgun Erdogan | DataEngConf NYC 2017
  • 10. PostgreSQL & High Availability (HA) Designing for a Cloud-native world 1
  • 11. Why is High Availability hard? PostgreSQL replication uses one primary & multiple secondary nodes. Two challenges: 1. Most Postgres clients aren’t smart. When the primary fails, they retry the same IP. 2. Postgres replicates entire state. This makes it resource intensive to reconstruct new nodes from a primary. Ozgun Erdogan | DataEngConf NYC 2017
  • 12. Database Failures Should Be Transparent Ozgun Erdogan | DataEngConf NYC 2017
  • 13. Database Failures Shouldn’t Be a Big Deal 1. PostgreSQL streaming replication to replicate from primary to secondary. Back up to S3. 2. Volume level replication to replicate to secondary’s volume. Back up to S3. 3. Incremental backups to S3. Reconstruct secondary nodes from S3. Ozgun Erdogan | DataEngConf NYC 2017 3 Methods for HA & Backups in Postgres
  • 14. Postgres - Streaming Replication (1) Write-ahead logs (streaming repl.) Table foo Primary – PostgreSQL streaming repl. Table bar WAL logs Table foo Table bar WAL logs Secondary – PostgreSQL streaming repl. Monitoring Agents - streaming repl. setup & auto failover S3 / Blob Storage (Encrypted) Backup Process Ozgun Erdogan | DataEngConf NYC 2017
  • 15. Postgres – AWS RDS & Azure (2) Postgres Primary Monitoring Agents (Auto node failover) Persistent Volume Postgres Standby S3 / Blob Storage (Encrypted) Table foo Table bar WAL logs Table foo Table bar WAL logs Backup process Backup Process Ozgun Erdogan | DataEngConf NYC 2017 Persistent Volume
  • 16. Postgres – Reconstruct from WAL (3) Postgres Primary Monitoring Agents (Auto node failover) Persistent Volume Postgres Secondary Backup Process S3 / Blob Storage (Encrypted) Table foo Table bar WAL logs Persistent Volume Table foo Table bar WAL logs Backup process Ozgun Erdogan | DataEngConf NYC 2017
  • 17. WHO DOES THIS? PRIMARY BENEFITS Streaming Replication (local / ephemeral disk) On-prem Manual EC2 Simple to set up Direct I/O: High I/O & large storage Disk Mirroring RDS Azure Preview Works for MySQL and PostgreSQL Data durability in cloud environments Reconstruct from WAL Heroku Citus Data Enables Fork and PITR Node reconstruction in background (Data durability in cloud environments) How do these approaches compare? 17 Ozgun Erdogan | DataEngConf NYC 2017
  • 18. Summary • In PostgreSQL, a database node’s state gets replicated in its entirety. The replication can be set up in three ways. • Reconstructing a secondary node from S3 makes bringing up or shooting down nodes easy. • When you shard your database, the state you need to replicate per node becomes smaller. Ozgun Erdogan | DataEngConf NYC 2017
  • 19. PostgreSQL has a huge ecosystem. How do you keep up with it? 2
  • 20. 3 ways to build a distributed database 1. Build a distributed database from scratch 2. Middleware sharding (mimic the parser) 3. Fork your favorite database (like PostgreSQL) Ozgun Erdogan | DataEngConf NYC 2017
  • 21. Example Transaction Block Ozgun Erdogan | DataEngConf NYC 2017
  • 22. Postgres Features, Tools & Frameworks • PostgreSQL manual (US Letter) • Clients for diff programming languages • ORMs, libraries, GUIs • Tools (dump, restore, analyze) • New features Ozgun Erdogan | DataEngConf NYC 2017
  • 23. At First, Forked PostgreSQL with Style Ozgun Erdogan | DataEngConf NYC 2017
  • 24. Two Stage Query Optimization 1. Plan to minimize network I/O 2. Nodes talk to each other using SQL over libpq 3. Learned to cooperate with planner / executor bit by bit (Volcano style executor) Ozgun Erdogan | DataEngConf NYC 2017
  • 25. Citus Architecture (Simplified) 25 SELECT avg(revenue) FROM sales Coordinator SELECT sum(revenue), count(revenue) FROM table_1001 SELECT sum … FROM table_1003 Worker node 1 Table metadata Table_1001 Table_1003 SELECT sum … FROM table_1002 SELECT sum … FROM table_1004 Worker node 2 Table_1002 Table_1004 Worker node N . . . . . . Each node PostgreSQL with Citus installed 1 shard = 1 PostgreSQL table Ozgun Erdogan | DataEngConf NYC 2017
  • 26. Unfork Citus using Extension APIs CREATE EXTENSION citus; • System catalogs – Distributed metadata • Planner hook – Insert, Update, Delete, Select • Executor hook – Insert, Update, Delete, Select • Utility hook – Alter Table, Create Index, Vacuum, etc. • Transaction & resources handling – file descriptors, etc. • Background worker process – Maintenance processes (distributed deadlock detection, task tracker, etc.) • Logical decoding – Online data migrations Ozgun Erdogan | DataEngConf NYC 2017
  • 27. PostgreSQL has transactions. How to handle distributed transactions 3
  • 29. Consistency in Distributed Databases 1. 2PC: All participating nodes need to be up 2. Paxos: Achieves consensus with quorum 3. Raft: More understandable alternative to Paxos Ozgun Erdogan | DataEngConf NYC 2017
  • 30. Concurrency in Distributed Databases Ozgun Erdogan | DataEngConf NYC 2017
  • 33. Transactions Block on 1st Conflicting LockWhat is a lock? Protects against concurrent modifications Locks released at end of transaction BEGIN; UPDATE data SET y = 2 WHERE x = 1; <obtained lock on rows with x = 1> COMMIT; <all locks released> BEGIN; UPDATE data SET y = 5 WHERE x = 1; <waiting for lock on rows with x = 1> <obtained lock on rows with x = 1> COMMIT;
  • 34. Transactions and Concurrency • Transactions that don’t modify the same row can run concurrently. Transactions block on 1st lock that conflicts BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; COMMIT; <all locks released> BEGIN; UPDATE data SET y = y + 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; <waiting for lock on rows with x = 1> <obtained lock on rows with x = 1> COMMIT;
  • 35. Transactions and Concurrency • Transactions that don’t modify the same row can run concurrently. Transactions block on 1st lock that conflicts BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; COMMIT; <all locks released> BEGIN; UPDATE data SET y = y + 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; <waiting for lock on rows with x = 1> <obtained lock on rows with x = 1> COMMIT; (Distributed) deadlock! BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; UPDATE data SET y = y + 1 WHERE x = 2; BEGIN; UPDATE data SET y = y - 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; But what if they start blocking each other?
  • 36. Transactions and Concurrency • Transactions that don’t modify the same row can run concurrently. Transactions block on 1st lock that conflicts BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; COMMIT; <all locks released> BEGIN; UPDATE data SET y = y + 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; <waiting for lock on rows with x = 1> <obtained lock on rows with x = 1> COMMIT; (Distributed) deadlock! BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; UPDATE data SET y = y + 1 WHERE x = 2; BEGIN; UPDATE data SET y = y - 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; But what if they start blocking each other?Deadlock detection in PostgreSQL Deadlock detection builds a graph of processes that are waiting for each other.
  • 37. Transactions and Concurrency • Transactions that don’t modify the same row can run concurrently. Transactions block on 1st lock that conflicts BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; COMMIT; <all locks released> BEGIN; UPDATE data SET y = y + 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; <waiting for lock on rows with x = 1> <obtained lock on rows with x = 1> COMMIT; (Distributed) deadlock! BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; UPDATE data SET y = y + 1 WHERE x = 2; BEGIN; UPDATE data SET y = y - 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; But what if they start blocking each other?Deadlock detection in PostgreSQL Deadlock detection builds a graph of processes that are waiting for each other. Deadlock detection in PostgreSQL Transactions are cancelled until the cycle is gone
  • 38. Transactions and Concurrency • Transactions that don’t modify the same row can run concurrently. Transactions block on 1st lock that conflicts BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; COMMIT; <all locks released> BEGIN; UPDATE data SET y = y + 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; <waiting for lock on rows with x = 1> <obtained lock on rows with x = 1> COMMIT; (Distributed) deadlock! BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; UPDATE data SET y = y + 1 WHERE x = 2; BEGIN; UPDATE data SET y = y - 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; But what if they start blocking each other?Deadlock detection in PostgreSQL Deadlock detection builds a graph of processes that are waiting for each other. Deadlock detection in PostgreSQL Transactions are cancelled until the cycle is gone Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus Citus delegates transactions to nodes Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus Citus delegates transactions to nodes
  • 39. Transactions and Concurrency • Transactions that don’t modify the same row can run concurrently. Transactions block on 1st lock that conflicts BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; COMMIT; <all locks released> BEGIN; UPDATE data SET y = y + 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; <waiting for lock on rows with x = 1> <obtained lock on rows with x = 1> COMMIT; (Distributed) deadlock! BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; UPDATE data SET y = y + 1 WHERE x = 2; BEGIN; UPDATE data SET y = y - 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; But what if they start blocking each other?Deadlock detection in PostgreSQL Deadlock detection builds a graph of processes that are waiting for each other. Deadlock detection in PostgreSQL Transactions are cancelled until the cycle is gone Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus Citus delegates transactions to nodes Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus Citus delegates transactions to nodes Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus PostgreSQL’s deadlock detector still works
  • 40. Transactions and Concurrency • Transactions that don’t modify the same row can run concurrently. Transactions block on 1st lock that conflicts BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; COMMIT; <all locks released> BEGIN; UPDATE data SET y = y + 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; <waiting for lock on rows with x = 1> <obtained lock on rows with x = 1> COMMIT; (Distributed) deadlock! BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; UPDATE data SET y = y + 1 WHERE x = 2; BEGIN; UPDATE data SET y = y - 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; But what if they start blocking each other?Deadlock detection in PostgreSQL Deadlock detection builds a graph of processes that are waiting for each other. Deadlock detection in PostgreSQL Transactions are cancelled until the cycle is gone Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus Citus delegates transactions to nodes Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus Citus delegates transactions to nodes Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus PostgreSQL’s deadlock detector still works Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus When deadlocks span across node, PostgreSQL cannot help us Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus When deadlocks span across node, PostgreSQL cannot help us
  • 41. Transactions and Concurrency • Transactions that don’t modify the same row can run concurrently. Transactions block on 1st lock that conflicts BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; COMMIT; <all locks released> BEGIN; UPDATE data SET y = y + 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; <waiting for lock on rows with x = 1> <obtained lock on rows with x = 1> COMMIT; (Distributed) deadlock! BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; UPDATE data SET y = y + 1 WHERE x = 2; BEGIN; UPDATE data SET y = y - 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; But what if they start blocking each other?Deadlock detection in PostgreSQL Deadlock detection builds a graph of processes that are waiting for each other. Deadlock detection in PostgreSQL Transactions are cancelled until the cycle is gone Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus Citus delegates transactions to nodes Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus Citus delegates transactions to nodes Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus PostgreSQL’s deadlock detector still works Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus When deadlocks span across node, PostgreSQL cannot help us Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus When deadlocks span across node, PostgreSQL cannot help us Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlock detection in Citus 7 Citus 7 adds distributed deadlock detection
  • 42. Transactions and Concurrency • Transactions that don’t modify the same row can run concurrently. Transactions block on 1st lock that conflicts BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; COMMIT; <all locks released> BEGIN; UPDATE data SET y = y + 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; <waiting for lock on rows with x = 1> <obtained lock on rows with x = 1> COMMIT; (Distributed) deadlock! BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; UPDATE data SET y = y + 1 WHERE x = 2; BEGIN; UPDATE data SET y = y - 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; But what if they start blocking each other?Deadlock detection in PostgreSQL Deadlock detection builds a graph of processes that are waiting for each other. Deadlock detection in PostgreSQL Transactions are cancelled until the cycle is gone Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus Citus delegates transactions to nodes Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus Citus delegates transactions to nodes Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus PostgreSQL’s deadlock detector still works Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus When deadlocks span across node, PostgreSQL cannot help us Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus When deadlocks span across node, PostgreSQL cannot help us Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlock detection in Citus 7 Citus 7 adds distributed deadlock detection Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlock detection in Citus 7 Citus 7 adds distributed deadlock detection.
  • 43. Distributed transactions are… a complex topic • Most articles on distributed transactions focus on data consistency. • Data consistency is only one side of the coin. If you’re using a relational database, your application benefits from another key feature: deadlock detection. • https://www.citusdata.com/blog/2017/08/31/databases -and-distributed-deadlocks-a-faq Ozgun Erdogan | DataEngConf NYC 2017
  • 44. So now what? We talked about 3 challenges distributing Postgres… 1. PostgreSQL, Replication, High Availability 2. Tradeoffs in different approaches to building a distributed database—and how we chose PostgreSQL’s extension APIs 3. Distributed deadlock detection & distributed transactions Ozgun Erdogan | DataEngConf NYC 2017
  • 45. 45 “SQL is hard, not impossible, to scale”
  • 46. © 2017 Citus Data. All right reserved. ozgun@citusdata.com Questions? @citusdata Ozgun Erdogan www.citusdata.com