The Challenges of Distributing Postgres: A Citus Story

The Challenges of
Distributing Postgres:
A Citus Story
Ozgun Erdogan
DataEngConf NYC | October 2017

Developers Love Postgres
PostgreSQL
MySQL
MongoDB
SQL Server +
Oracle
RDBMS: PostgreSQL, MySQL, Microsoft SQL Server, Oracle
Ozgun Erdogan | DataEngConf NYC 2017

I love Postgres, too
3 Ozgun Erdogan | DataEngConf NYC 2017
Ozgun Erdogan
CTO of Citus Data
Distributed Systems
Distributed Databases
Formerly of Amazon
Love drinking margaritas

Our mission at Citus Data
Make it so SaaS businesses
never have to worry about
scaling their database again

What is the Citus database?
1.Scales out PostgreSQL
2.Extension to PostgreSQL
3.Available in 3 Ways
• Using sharding & replication
• Query engine parallelizes SQL queries across many nodes
• Using PostgreSQL extension APIs

Citus, Packaged Three Ways
Open
Source
Enterprise
Software
Fully-Managed
Database as a Service
github.com/citusdata/citus

3 Challenges Distributing Postgres
1. PostgreSQL and High Availability
2. To build new distributed database—or to fork?
3. Distributed transactions

PostgreSQL &
High Availability (HA)
Designing for a Cloud-native world
1

Why is High Availability hard?
PostgreSQL replication uses one primary &
multiple secondary nodes. Two challenges:
1. Most Postgres clients aren’t smart. When the
primary fails, they retry the same IP.
2. Postgres replicates entire state. This makes it
resource intensive to reconstruct new nodes from a
primary.

Database Failures Should Be Transparent

Database Failures Shouldn’t Be a Big Deal
1. PostgreSQL streaming replication to replicate from
primary to secondary. Back up to S3.
2. Volume level replication to replicate to secondary’s
volume. Back up to S3.
3. Incremental backups to S3. Reconstruct secondary
nodes from S3.
3 Methods for HA & Backups in Postgres

Postgres - Streaming Replication (1)
Write-ahead logs
(streaming repl.)
Table foo
Primary –
PostgreSQL
streaming repl.
Table bar
WAL logs
Table foo
Table bar
WAL logs
Secondary –
PostgreSQL
streaming repl.
Monitoring Agents -
streaming repl.
setup & auto failover
S3 / Blob Storage
(Encrypted)
Backup
Process

Postgres – AWS RDS & Azure (2)
Postgres
Primary
Monitoring Agents
(Auto node failover)
Persistent Volume
Postgres
Standby
S3 / Blob Storage
(Encrypted)
Table foo
Table bar
WAL logs
Table foo
Table bar
WAL logs
Backup process
Backup
Process
Persistent Volume

Postgres – Reconstruct from WAL (3)
Postgres
Primary
Monitoring Agents
(Auto node failover)
Persistent Volume
Postgres
Secondary
Backup
Process
S3 / Blob Storage
(Encrypted)
Table foo
Table bar
WAL logs
Persistent Volume
Table foo
Table bar
WAL logs
Backup process

WHO DOES THIS? PRIMARY BENEFITS
Streaming Replication
(local / ephemeral disk)
On-prem
Manual EC2
Simple to set up
Direct I/O: High I/O & large storage
Disk Mirroring
RDS
Azure Preview
Works for MySQL and PostgreSQL
Data durability in cloud environments
Reconstruct from WAL
Heroku
Citus Data
Enables Fork and PITR
Node reconstruction in background
(Data durability in cloud environments)
How do these approaches compare?

Summary
• In PostgreSQL, a database node’s state gets
replicated in its entirety. The replication can be set up
in three ways.
• Reconstructing a secondary node from S3 makes
bringing up or shooting down nodes easy.
• When you shard your database, the state you need to
replicate per node becomes smaller.

PostgreSQL has a
huge ecosystem.
How do you keep up with it?
2

3 ways to build a distributed database
1. Build a distributed database from scratch
2. Middleware sharding (mimic the parser)
3. Fork your favorite database (like PostgreSQL)

Example Transaction Block

Postgres Features, Tools & Frameworks
• PostgreSQL manual (US Letter)
• Clients for diff programming
languages
• ORMs, libraries, GUIs
• Tools (dump, restore, analyze)
• New features

At First, Forked PostgreSQL with Style

Two Stage Query Optimization
1. Plan to minimize network I/O
2. Nodes talk to each other using SQL over libpq
3. Learned to cooperate with planner / executor bit by bit
(Volcano style executor)

Citus Architecture (Simplified)
25
SELECT avg(revenue)
FROM sales
Coordinator
SELECT sum(revenue), count(revenue)
FROM table_1001
SELECT sum … FROM table_1003
Worker node 1
Table metadata
Table_1001
Table_1003
Worker node 2
Table_1002
Table_1004
Worker node N
.
.
.
.
.
.
Each node PostgreSQL with Citus installed
1 shard = 1 PostgreSQL table

Unfork Citus using Extension APIs
CREATE EXTENSION citus;
• System catalogs – Distributed metadata
• Planner hook – Insert, Update, Delete, Select
• Executor hook – Insert, Update, Delete, Select
• Utility hook – Alter Table, Create Index, Vacuum, etc.
• Transaction & resources handling – file descriptors, etc.
• Background worker process – Maintenance processes
(distributed deadlock detection, task tracker, etc.)
• Logical decoding – Online data migrations

PostgreSQL has transactions.
How to handle distributed transactions
3

BEGIN
INSERT
UPDATE
SELECT
COMMIT
ROLLBACK

Consistency in Distributed Databases
1. 2PC: All participating nodes need to be up
2. Paxos: Achieves consensus with quorum
3. Raft: More understandable alternative to
Paxos

Concurrency in Distributed Databases

What is a Lock?
• Protects against concurrent modifications.
• Locks are released at the end of a transaction.
Deadlocks

Transactions Block on 1st Conflicting LockWhat is a lock?
Protects against concurrent modifications
Locks released at end of transaction
BEGIN;
UPDATE data SET y = 2 WHERE x = 1;
<obtained lock on rows with x = 1>
COMMIT;
<all locks released>
BEGIN;
UPDATE data SET y = 5 WHERE x = 1;
<waiting for lock on rows with x = 1>
COMMIT;

Transactions and Concurrency
• Transactions that don’t modify the same row
can run concurrently.
Transactions block on 1st lock that conflicts
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 1;
COMMIT;
BEGIN;
UPDATE data SET y = y + 1 WHERE x = 2;
COMMIT;

BEGIN;
COMMIT;
BEGIN;
COMMIT;
(Distributed) deadlock!
BEGIN;
BEGIN;
But what if they start blocking each other?

BEGIN;
COMMIT;
BEGIN;
COMMIT;
BEGIN;
BEGIN;
But what if they start blocking each other?Deadlock detection in PostgreSQL
Deadlock detection builds a graph of processes that
are waiting for each other.

BEGIN;
COMMIT;
BEGIN;
COMMIT;
BEGIN;
BEGIN;
Deadlock detection in PostgreSQL
Transactions are cancelled until the cycle is gone

BEGIN;
COMMIT;
BEGIN;
COMMIT;
BEGIN;
BEGIN;
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlocks in Citus
Citus delegates transactions to nodes
Deadlocks in Citus

BEGIN;
COMMIT;
BEGIN;
COMMIT;
BEGIN;
BEGIN;
Deadlocks in Citus
Deadlocks in Citus
Deadlocks in Citus
PostgreSQL’s deadlock detector still works

BEGIN;
COMMIT;
BEGIN;
COMMIT;
BEGIN;
BEGIN;
Deadlocks in Citus
Deadlocks in Citus
Deadlocks in Citus
Deadlocks in Citus
When deadlocks span across node, PostgreSQL cannot help us
Deadlocks in Citus

BEGIN;
COMMIT;
BEGIN;
COMMIT;
BEGIN;
BEGIN;
Deadlocks in Citus
Deadlocks in Citus
Deadlocks in Citus
Deadlocks in Citus
Deadlocks in Citus
Deadlock detection in Citus 7
Citus 7 adds distributed deadlock detection

BEGIN;
COMMIT;
BEGIN;
COMMIT;
BEGIN;
BEGIN;
Deadlocks in Citus
Deadlocks in Citus
Deadlocks in Citus
Deadlocks in Citus
Deadlocks in Citus
Citus 7 adds distributed deadlock detection
Citus 7 adds distributed deadlock detection.

Distributed transactions are… a
complex topic
• Most articles on distributed transactions focus on data
consistency.
• Data consistency is only one side of the coin. If you’re
using a relational database, your application benefits
from another key feature: deadlock detection.
• https://www.citusdata.com/blog/2017/08/31/databases
-and-distributed-deadlocks-a-faq

So now what? We talked about 3
challenges distributing Postgres…
1. PostgreSQL, Replication, High Availability
2. Tradeoffs in different approaches to building a
distributed database—and how we chose
PostgreSQL’s extension APIs
3. Distributed deadlock detection & distributed
transactions

45
“SQL is hard, not impossible, to scale”

The Challenges of Distributing Postgres: A Citus Story

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to The Challenges of Distributing Postgres: A Citus Story

Similar to The Challenges of Distributing Postgres: A Citus Story (20)

Recently uploaded

Recently uploaded (20)

The Challenges of Distributing Postgres: A Citus Story