SlideShare ist ein Scribd-Unternehmen logo
1 von 57
Downloaden Sie, um offline zu lesen
Chicago PostgreSQL User Group - October 20, 2021 Jonathan S. Katz
Let's Build a Complex, Real-
Time Data Management
Application
• VP, Platform Engineering @ Crunchy Data
• Previously: Engineering Leadership @ Startups
• Longtime PostgreSQL community contributor
• Core Team Member
• Various Governance Committees
• Conference Organizer / Speaker
• @jkatz05
About Me
• Leading Team in Postgres – 10 contributors
• Certified Open Source PostgreSQL Distribution
• Leader in Postgres Technology for Kubernetes
• Crunchy Bridge: Fully managed cloud service
Crunchy Data
Your partner in deploying
open source PostgreSQL
throughout your enterprise.
CPSM Provider Plugin
This talk introduces many different tools and techniques available
in PostgreSQL for building applications.
It introduces different features and where to find out more
information.
We have a lot of material to cover in a short time - the slides and
demonstrations will be made available
How to Approach This Talk
CPSM Provider Plugin
Imagine we are managing virtual rooms for an event platform.
We have a set of operating hours in which the rooms can be
booked.
Only one booking can occur in a virtual room at a given time.
The Problem
CPSM Provider Plugin
For Example
CPSM Provider Plugin
We need to know...
- All the rooms that are available to book
- When the rooms are available to be booked (operating hours)
- When the rooms have been booked
And...
The system needs to be able to CRUD fast
(Create, Read, Update, Delete. Fast).
Specifications
🤔
Interlude:
Finding Availability
CPSM Provider Plugin
Availability can be thought about in three ways:
Closed
Available
Unavailable (or "booked")
Our ultimate "calendar tuple" is (room, status, range)
Managing Availability
CPSM Provider Plugin
PostgreSQL 9.2 introduced "range types" that included the ability to store and
efficiently search over ranges of data.
Built-in:
Date, Timestamps
Integer, Numeric
Lookups (e.g. overlaps) can be sped up using GiST indexes
Postgres Range Types
SELECT tstzrange('2021-10-28 09:30'::timestamptz, '2021-10-28 10:30'::timestamptz);
Availability
Availability
SELECT *
FROM (
VALUES
('closed', tstzrange('2021-10-28 0:00', '2021-10-28 8:00')),
('available', tstzrange('2021-10-28 08:00', '2021-10-28 09:30')),
('unavailable', tstzrange('2021-10-28 09:30', '2021-10-28 10:30')),
('available', tstzrange('2021-10-28 10:30', '2021-10-28 16:30')),
('unavailable', tstzrange('2021-10-28 16:30', '2021-10-28 18:30')),
('available', tstzrange('2021-10-28 18:30', '2021-10-28 20:00')),
('closed', tstzrange('2021-10-28 20:00', '2021-10-29 0:00'))
) x(status, calendar_range)
ORDER BY lower(x.calendar_range);
Easy, Right?
CPSM Provider Plugin
Insert new ranges and dividing them up
PostgreSQL did not work well with noncontiguous ranges…until PostgreSQL 14
Availability
Just for one day - what about other days?
What happens with data in the past?
What happens with data in the future?
Unavailability
Ensure no double-bookings
Overlapping Events?
Handling multiple spaces
But…
Managing Availability
availability_rule
id <serial> PRIMARY KEY
room_id <int> REFERENCES (room)
days_of_week <int[]>
start_time <time>
end_time <time>
generate_weeks_into_future <int>
DEFAULT 52
room
id <serial>
PRIMARY KEY
name <text>
availability
id <serial> PRIMARY KEY
room_id <int> REFERENCES
(room)
availability_rule_id <int>
REFERENCES (availabilityrule)
available_date <date>
available_range <tstzrange>
unavailability
id <serial> PRIMARY KEY
room_id <int> REFERENCES
(room)
unavailable_date <date>
unavailable_range <tstzrange>
calendar
id <serial> PRIMARY KEY
room_id <int> REFERENCES
(room)
status <text> DOMAIN:
{available, unavailable, closed}
calendar_date <date>
calendar_range <tstzrange>
CPSM Provider Plugin
We can now store data, but what about:
Generating initial calendar?
Generating availability based on rules?
Generating unavailability?
Sounds like we need to build an application
Managing Availability
CPSM Provider Plugin
To build our application, there are a few topics we will need to explore first:
generate_series
Recursive queries
Ranges and Multiranges
SQL Functions
Set returning functions
PL/pgsql
Triggers
Managing Availability
CPSM Provider Plugin
Generate series is a "set returning" function, i.e. a function that can return
multiple rows of data.
Generate series can return:
A set of numbers (int, bigint, numeric) either incremented by 1 or some
other integer interval
A set of timestamps incremented by a time interval(!!)
generate_series:
More Than Just For Test Data
SELECT x::date
FROM generate_series(
'2021-01-01'::date, '2021-12-31'::date, '1 day'::interval
) x;
CPSM Provider Plugin
PostgreSQL 8.4 introduced the "WITH" syntax and with it also introduced the
ability to perform recursive queries
WITH RECURSIVE ... AS ()
Base case vs. recursive case
UNION vs. UNION ALL
CAN HIT INFINITE LOOPS
Recursion in SQL?
CPSM Provider Plugin
Recursion in SQL?
WITH RECURSIVE fac AS (
SELECT
1::numeric AS n,
1::numeric AS i
UNION
SELECT
fac.n * (fac.i + 1),
fac.i + 1 AS i
FROM fac
)
SELECT fac.n, fac.i
FROM fac;
Infinite Recursion
CPSM Provider Plugin
Recursion in SQL?
WITH RECURSIVE fac AS (
SELECT
1::numeric AS n,
1::numeric AS i
UNION
SELECT
fac.n * (fac.i + 1),
fac.i + 1 AS i
FROM fac
)
SELECT fac.n, fac.i
FROM fac
LIMIT 100;
Postgres 14 introduces multirange types
Ordered list of ranges
Can be noncontiguous
Adds range aggregates: range_agg and unnest
Multirange Types
SELECT
datemultirange(
daterange(CURRENT_DATE, CURRENT_DATE + 1),
daterange(CURRENT_DATE + 5, CURRENT_DATE + 8),
daterange(CURRENT_DATE + 15, CURRENT_DATE + 22)
);
CPSM Provider Plugin
PostgreSQL provides the ability to write functions to help encapsulate
repeated behavior
PostgreSQL 11 introduces stored procedures which enables you to
embed transactions! PostgreSQL 14 adds the ability to get output from stored
procedures!
SQL functions have many properties, including:
Input / output
Volatility (IMMUTABLE, STABLE, VOLATILE) (default VOLATILE)
Parallel safety (default PARALLEL UNSAFE)
LEAKPROOF; SECURITY DEFINER
Execution Cost
Language type (more on this later)
Functions
CPSM Provider Plugin
Functions
CREATE OR REPLACE FUNCTION chipug_fac(n int)
RETURNS numeric
AS $$
WITH RECURSIVE fac AS (
SELECT
1::numeric AS n,
1::numeric AS i
UNION
SELECT
fac.n * (fac.i + 1),
fac.i + 1 AS i
FROM fac
WHERE i + 1 <= $1
)
SELECT max(fac.n)
FROM fac;
$$ LANGUAGE SQL IMMUTABLE PARALLEL SAFE;
CPSM Provider Plugin
Functions
CREATE OR REPLACE FUNCTION chipug_fac_set(n int)
RETURNS SETOF numeric
AS $$
WITH RECURSIVE fac AS (
SELECT
1::numeric AS n,
1::numeric AS i
UNION
SELECT
fac.n * (fac.i + 1),
fac.i + 1 AS i
FROM fac
WHERE i + 1 <= $1
)
SELECT fac.n
FROM fac
ORDER BY fac.n;
$$ LANGUAGE SQL IMMUTABLE PARALLEL SAFE;
CPSM Provider Plugin
Functions
CREATE OR REPLACE FUNCTION chipug_fac_table(n int)
RETURNS TABLE(n numeric)
AS $$
WITH RECURSIVE fac AS (
SELECT
1::numeric AS n,
1::numeric AS i
UNION
SELECT
fac.n * (fac.i + 1),
fac.i + 1 AS i
FROM fac
WHERE i + 1 <= $1
)
SELECT fac.n
FROM fac
ORDER BY fac.n;
$$ LANGUAGE SQL IMMUTABLE PARALLEL SAFE;
CPSM Provider Plugin
PostgreSQL has the ability to load in procedural languages ("PL") and execute
code in them beyond SQL.
Built-in: pgSQL, Python, Perl, Tcl
Others: Javascript, R, Java, C, JVM, Container, LOLCODE, Ruby, PHP, Lua,
pgPSM, Scheme
Procedural Languages
CPSM Provider Plugin
PL/pgSQL
CREATE EXTENSION IF NOT EXISTS plpgsql;
CREATE OR REPLACE FUNCTION chipug_fac_plpgsql(n int)
RETURNS numeric
AS $$
DECLARE
fac numeric;
i int;
BEGIN
fac := 1;
FOR i IN 1..n LOOP
fac := fac * i;
END LOOP;
RETURN fac;
END;
$$ LANGUAGE plpgsql IMMUTABLE PARALLEL SAFE;
CPSM Provider Plugin
Triggers are functions that can be called before/after/instead of an operation or event
Data changes (INSERT/UPDATE/DELETE)
Events (DDL, DCL, etc. changes)
Atomic
Must return "trigger" or "event_trigger"
(Return "NULL" in a trigger if you want to skip operation)
(Gotcha: RETURN OLD [INSERT] / RETURN NEW [DELETE])
Execute once per modified row or once per SQL statement
Multiple triggers on same event will execute in alphabetical order
Writeable in any PL language that defined trigger interface
Triggers
Building a
Synchronized System
We'll Scan the Code
It's Available for Download 😉
The Test
CPSM Provider Plugin
[Test your live demos before running them, and you will have much
success!]
availability_rule inserts took some time, > 350ms
availability: INSERT 52
calendar: INSERT 52 from nontrivial function
Updates on individual availability / unavailability are not too painful
Lookups are faaaaaaaast
Lessons of the Test
How About At (Web) Scale?
CPSM Provider Plugin
Recursive CTE 😢
Even with only 100 more rooms with a few set of rules, rule
generation time increased significantly
Multirange Types
These are still pretty fast and are handling scaling up well.
May still be slow for a web transaction.
Lookups are still lightning fast!
Web Scale
CPSM Provider Plugin
Added in PostgreSQL 9.4
Replays all logical changes made to the database
Create a logical replication slot in your database
Only one receiver can consume changes from one slot at a time
Slot keeps track of last change that was read by a receiver
If receiver disconnects, slot will ensure database holds changes until
receiver reconnects
Only changes from tables with primary keys are relayed
As of PostgreSQL 10, you can set a "REPLICA IDENTITY" on a
UNIQUE, NOT NULL, non-deferrable, non-partial column(s)
Basis for Logical Replication
Logical Decoding
CPSM Provider Plugin
A logical replication slot has a name and an output plugin
PostgreSQL comes with the "test" output plugin
Have to write a custom parser to read changes from test output plugin
Several output plugins and libraries available
wal2json: https://github.com/eulerto/wal2json
jsoncdc: https://github.com/instructure/jsoncdc
Debezium: http://debezium.io/
(Test: https://www.postgresql.org/docs/current/static/test-decoding.html)
Logical Replication (pgoutput)
Every data change in the database is streamed
Need to be aware of the logical decoding format
Logical Decoding Out of the Box
CPSM Provider Plugin
C: libpq
pg_recvlogical
PostgreSQL functions
Python: psycopg2 - version 2.7
JDBC: version 42
Go: pgx
JavaScript: node-postgres (pg-logical-replication)
Driver Support
CPSM Provider Plugin
Using Logical Decoding
CPSM Provider Plugin
We know it takes time to regenerate calendar
Want to ensure changes always propagate but want to ensure all users
(managers, calendar searchers) have good experience
Thoughts🤔
CPSM Provider Plugin
Will use the same data model as before as well as the same helper
functions, but without the triggers
We will have a Python script that reads from a logical replication
slot and if it detects a relevant change, take an action
Similar to what we did with triggers, but this moves the work to
OUTSIDE the transaction
BUT...we can confirm whether or not the work is completed, thus if
the program fails, we can restart from last acknowledged
transaction ID
Replacing Triggers
Reviewing the Code
CPSM Provider Plugin
A consumer of the logical stream can only read one change at a time
If our processing of a change takes a lot of time, it will create a backlog
of changes
Backlog means the PostgreSQL server needs to retain more WAL logs
Retaining too many WAL logs can lead to running out of disk space
Running out of disk space can lead to...rough times.
The Consumer Bottleneck
🌤
🌥
☁
🌩
Eliminating the Bottleneck
CPSM Provider Plugin
Can utilize a durable message queueing system to store any WAL changes
that are necessary to perform post-processing on
Ensure the changes are worked on in order
"Divide-and-conquer" workload - have multiple workers acting on
different "topics"
Remove WAL bloat
Shifting the Workload
CPSM Provider Plugin
Durable message processing and distribution system
Streams
Supports parallelization of consumers
Multiple consumers, partitions
Highly-available, distributed architecture
Acknowledgement of receiving, processing messages; can replay (sounds like
WAL?)
Can also accomplish this with Debezium, which interfaces with Kafka +
Postgres
Apache Kafka
CPSM Provider Plugin
Architecture
CPSM Provider Plugin
WAL Consumer
import json, sys
from kafka import KafkaProducer
from kafka.errors import KafkaError
import psycopg2
import psycopg2.extras
TABLES = set([
'availability', 'availability_rule', 'room', 'unavailability',
])
reader = WALConsumer()
cursor = reader.connection.cursor()
cursor.start_replication(slot_name='schedule', decode=True)
try:
cursor.consume_stream(reader)
except KeyboardInterrupt:
print("Stopping reader...")
finally:
cursor.close()
reader.connection.close()
print("Exiting reader")
CPSM Provider Plugin
class WALConsumer(object):
def __init__(self):
self.connection = psycopg2.connect("dbname=realtime",
connection_factory=psycopg2.extras.LogicalReplicationConnection,
)
self.producer = producer = KafkaProducer(
bootstrap_servers=['localhost:9092'],
value_serializer=lambda m: json.dumps(m).encode('ascii'),
)
def __call__(self, msg):
payload = json.loads(msg.payload, strict=False)
print(payload)
# determine if the payload should be passed on to a consumer
listening
# to the Kafka que
for data in payload['change']:
if data.get('table') in TABLES:
self.producer.send(data.get('table'), data)
# ensure everything is sent; call flush at this point
self.producer.flush()
# acknowledge that the change has been read - tells PostgreSQL to
stop
# holding onto this log file
msg.cursor.send_feedback(flush_lsn=msg.data_start)
CPSM Provider Plugin
Kafka Consumer
import json
from kafka import KafkaConsumer
from kafka.structs import OffsetAndMetadata, TopicPartition
import psycopg2
class Worker(object):
"""Base class to work perform any post processing on changes"""
OPERATIONS = set([]) # override with "insert", "update", "delete"
def __init__(self, topic):
# connect to the PostgreSQL database
self.connection = psycopg2.connect("dbname=realtime")
# connect to Kafka
self.consumer = KafkaConsumer(
bootstrap_servers=['localhost:9092'],
value_deserializer=lambda m: json.loads(m.decode('utf8')),
auto_offset_reset="earliest",
group_id='1')
# subscribe to the topic(s)
self.consumer.subscribe(topic if isinstance(topic, list) else [topic])
CPSM Provider Plugin
Kafka Consumer
def run(self):
"""Function that runs ad-infinitum"""
# loop through the payloads from the consumer
# determine if there are any follow-up actions based on the kind of
# operation, and if so, act upon it
# always commit when done.
for msg in self.consumer:
print(msg)
# load the data from the message
data = msg.value
# determine if there are any follow-up operations to perform
if data['kind'] in self.OPERATIONS:
# open up a cursor for interacting with PostgreSQL
cursor = self.connection.cursor()
# put the parameters in an easy to digest format
params = dict(zip(data['columnnames'], data['columnvalues']))
# all the function
getattr(self, data['kind'])(cursor, params)
# commit any work that has been done, and close the cursor
self.connection.commit()
cursor.close()
# acknowledge the message has been handled
tp = TopicPartition(msg.topic, msg.partition)
offsets = {tp: OffsetAndMetadata(msg.offset, None)}
self.consumer.commit(offsets=offsets)
CPSM Provider Plugin
Kafka Consumer
# override with the appropriate post-processing code
def insert(self, cursor, params):
"""Override with any post-processing to be done on an ``INSERT``"""
raise NotImplementedError()
def update(self, cursor, params):
"""Override with any post-processing to be done on an ``UPDATE``"""
raise NotImplementedError()
def delete(self, cursor, params):
"""Override with any post-processing to be done on an ``DELETE``"""
raise NotImplementedError()
Testing the Application
CPSM Provider Plugin
Logical decoding allows the bulk inserts to occur significantly faster from a
transactional view
Potential bottleneck for long running execution, but bottlenecks are isolated to
specific queues
Newer versions of PostgreSQL has features that make it easier to build
applications and scale
Lessons
CPSM Provider Plugin
PostgreSQL is robust.
Triggers will keep your data in sync but can have significant
performance overhead
Utilizing a logical replication slot can eliminate trigger overhead
and transfer the computational load elsewhere
Not a panacea: still need to use good architectural patterns!
Conclusion
Thank You
jonathan.katz@crunchydata.com
@jkatz05
https://github.com/CrunchyData/postgres-realtime-demo

Weitere ähnliche Inhalte

Was ist angesagt?

Operating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with KubernetesOperating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with Kubernetes
Jonathan Katz
 
Understanding PostgreSQL LW Locks
Understanding PostgreSQL LW LocksUnderstanding PostgreSQL LW Locks
Understanding PostgreSQL LW Locks
Jignesh Shah
 

Was ist angesagt? (20)

Operating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with KubernetesOperating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with Kubernetes
 
introduction to linux kernel tcp/ip ptocotol stack
introduction to linux kernel tcp/ip ptocotol stack introduction to linux kernel tcp/ip ptocotol stack
introduction to linux kernel tcp/ip ptocotol stack
 
押さえておきたい、PostgreSQL 13 の新機能!!(Open Source Conference 2021 Online/Hokkaido 発表資料)
押さえておきたい、PostgreSQL 13 の新機能!!(Open Source Conference 2021 Online/Hokkaido 発表資料)押さえておきたい、PostgreSQL 13 の新機能!!(Open Source Conference 2021 Online/Hokkaido 発表資料)
押さえておきたい、PostgreSQL 13 の新機能!!(Open Source Conference 2021 Online/Hokkaido 発表資料)
 
Deep Dive into the Linux Kernel - メモリ管理におけるCompaction機能について
Deep Dive into the Linux Kernel - メモリ管理におけるCompaction機能についてDeep Dive into the Linux Kernel - メモリ管理におけるCompaction機能について
Deep Dive into the Linux Kernel - メモリ管理におけるCompaction機能について
 
Container Performance Analysis
Container Performance AnalysisContainer Performance Analysis
Container Performance Analysis
 
Vectors are the new JSON in PostgreSQL
Vectors are the new JSON in PostgreSQLVectors are the new JSON in PostgreSQL
Vectors are the new JSON in PostgreSQL
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL Administration
 
Understanding PostgreSQL LW Locks
Understanding PostgreSQL LW LocksUnderstanding PostgreSQL LW Locks
Understanding PostgreSQL LW Locks
 
Ansible troubleshooting 101_202007
Ansible troubleshooting 101_202007Ansible troubleshooting 101_202007
Ansible troubleshooting 101_202007
 
PostgreSQL: Advanced indexing
PostgreSQL: Advanced indexingPostgreSQL: Advanced indexing
PostgreSQL: Advanced indexing
 
Optimizing MariaDB for maximum performance
Optimizing MariaDB for maximum performanceOptimizing MariaDB for maximum performance
Optimizing MariaDB for maximum performance
 
Optimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloadsOptimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloads
 
PostgreSQL 14 モニタリング新機能紹介(PostgreSQL カンファレンス #24、2021/06/08)
PostgreSQL 14 モニタリング新機能紹介(PostgreSQL カンファレンス #24、2021/06/08)PostgreSQL 14 モニタリング新機能紹介(PostgreSQL カンファレンス #24、2021/06/08)
PostgreSQL 14 モニタリング新機能紹介(PostgreSQL カンファレンス #24、2021/06/08)
 
PostgreSQL Database Slides
PostgreSQL Database SlidesPostgreSQL Database Slides
PostgreSQL Database Slides
 
PostgreSQLのgitレポジトリから見える2022年の開発状況(第38回PostgreSQLアンカンファレンス@オンライン 発表資料)
PostgreSQLのgitレポジトリから見える2022年の開発状況(第38回PostgreSQLアンカンファレンス@オンライン 発表資料)PostgreSQLのgitレポジトリから見える2022年の開発状況(第38回PostgreSQLアンカンファレンス@オンライン 発表資料)
PostgreSQLのgitレポジトリから見える2022年の開発状況(第38回PostgreSQLアンカンファレンス@オンライン 発表資料)
 
nGram full text search (by 이성욱)
nGram full text search (by 이성욱)nGram full text search (by 이성욱)
nGram full text search (by 이성욱)
 
PostgreSQLのリカバリ超入門(もしくはWAL、CHECKPOINT、オンラインバックアップの仕組み)
PostgreSQLのリカバリ超入門(もしくはWAL、CHECKPOINT、オンラインバックアップの仕組み)PostgreSQLのリカバリ超入門(もしくはWAL、CHECKPOINT、オンラインバックアップの仕組み)
PostgreSQLのリカバリ超入門(もしくはWAL、CHECKPOINT、オンラインバックアップの仕組み)
 
Solving PostgreSQL wicked problems
Solving PostgreSQL wicked problemsSolving PostgreSQL wicked problems
Solving PostgreSQL wicked problems
 
Kernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixKernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at Netflix
 
Connection Pooling in PostgreSQL using pgbouncer
Connection Pooling in PostgreSQL using pgbouncer Connection Pooling in PostgreSQL using pgbouncer
Connection Pooling in PostgreSQL using pgbouncer
 

Ähnlich wie Build a Complex, Realtime Data Management App with Postgres 14!

Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Ontico
 
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenJohn Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
PostgresOpen
 
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San JoseThe Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose
Nikolay Samokhvalov
 

Ähnlich wie Build a Complex, Realtime Data Management App with Postgres 14! (20)

PostgreSQL 9.6 새 기능 소개
PostgreSQL 9.6 새 기능 소개PostgreSQL 9.6 새 기능 소개
PostgreSQL 9.6 새 기능 소개
 
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
 
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0
 
At the core you will have KUSTO
At the core you will have KUSTOAt the core you will have KUSTO
At the core you will have KUSTO
 
HBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQL
HBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQLHBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQL
HBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQL
 
Spock Framework - Slidecast
Spock Framework - SlidecastSpock Framework - Slidecast
Spock Framework - Slidecast
 
Spock Framework
Spock FrameworkSpock Framework
Spock Framework
 
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
 
QuestDB: The building blocks of a fast open-source time-series database
QuestDB: The building blocks of a fast open-source time-series databaseQuestDB: The building blocks of a fast open-source time-series database
QuestDB: The building blocks of a fast open-source time-series database
 
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenJohn Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
 
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San JoseThe Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose
 
Intro to Scalable Deep Learning on AWS with Apache MXNet
Intro to Scalable Deep Learning on AWS with Apache MXNetIntro to Scalable Deep Learning on AWS with Apache MXNet
Intro to Scalable Deep Learning on AWS with Apache MXNet
 
Jdbc oracle
Jdbc oracleJdbc oracle
Jdbc oracle
 
Linuxfest Northwest 2022 - MySQL 8.0 Nre Features
Linuxfest Northwest 2022 - MySQL 8.0 Nre FeaturesLinuxfest Northwest 2022 - MySQL 8.0 Nre Features
Linuxfest Northwest 2022 - MySQL 8.0 Nre Features
 
Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
 
JDBC Connecticity.ppt
JDBC Connecticity.pptJDBC Connecticity.ppt
JDBC Connecticity.ppt
 
Your Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic DatabaseYour Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic Database
 

Mehr von Jonathan Katz

Get Your Insecure PostgreSQL Passwords to SCRAM
Get Your Insecure PostgreSQL Passwords to SCRAMGet Your Insecure PostgreSQL Passwords to SCRAM
Get Your Insecure PostgreSQL Passwords to SCRAM
Jonathan Katz
 

Mehr von Jonathan Katz (11)

Vectors are the new JSON in PostgreSQL (SCaLE 21x)
Vectors are the new JSON in PostgreSQL (SCaLE 21x)Vectors are the new JSON in PostgreSQL (SCaLE 21x)
Vectors are the new JSON in PostgreSQL (SCaLE 21x)
 
High Availability PostgreSQL on OpenShift...and more!
High Availability PostgreSQL on OpenShift...and more!High Availability PostgreSQL on OpenShift...and more!
High Availability PostgreSQL on OpenShift...and more!
 
Get Your Insecure PostgreSQL Passwords to SCRAM
Get Your Insecure PostgreSQL Passwords to SCRAMGet Your Insecure PostgreSQL Passwords to SCRAM
Get Your Insecure PostgreSQL Passwords to SCRAM
 
Safely Protect PostgreSQL Passwords - Tell Others to SCRAM
Safely Protect PostgreSQL Passwords - Tell Others to SCRAMSafely Protect PostgreSQL Passwords - Tell Others to SCRAM
Safely Protect PostgreSQL Passwords - Tell Others to SCRAM
 
Using PostgreSQL With Docker & Kubernetes - July 2018
Using PostgreSQL With Docker & Kubernetes - July 2018Using PostgreSQL With Docker & Kubernetes - July 2018
Using PostgreSQL With Docker & Kubernetes - July 2018
 
An Introduction to Using PostgreSQL with Docker & Kubernetes
An Introduction to Using PostgreSQL with Docker & KubernetesAn Introduction to Using PostgreSQL with Docker & Kubernetes
An Introduction to Using PostgreSQL with Docker & Kubernetes
 
Developing and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDWDeveloping and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDW
 
On Beyond (PostgreSQL) Data Types
On Beyond (PostgreSQL) Data TypesOn Beyond (PostgreSQL) Data Types
On Beyond (PostgreSQL) Data Types
 
Accelerating Local Search with PostgreSQL (KNN-Search)
Accelerating Local Search with PostgreSQL (KNN-Search)Accelerating Local Search with PostgreSQL (KNN-Search)
Accelerating Local Search with PostgreSQL (KNN-Search)
 
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesWebscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
 
Indexing Complex PostgreSQL Data Types
Indexing Complex PostgreSQL Data TypesIndexing Complex PostgreSQL Data Types
Indexing Complex PostgreSQL Data Types
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 

Build a Complex, Realtime Data Management App with Postgres 14!

  • 1. Chicago PostgreSQL User Group - October 20, 2021 Jonathan S. Katz Let's Build a Complex, Real- Time Data Management Application
  • 2. • VP, Platform Engineering @ Crunchy Data • Previously: Engineering Leadership @ Startups • Longtime PostgreSQL community contributor • Core Team Member • Various Governance Committees • Conference Organizer / Speaker • @jkatz05 About Me
  • 3. • Leading Team in Postgres – 10 contributors • Certified Open Source PostgreSQL Distribution • Leader in Postgres Technology for Kubernetes • Crunchy Bridge: Fully managed cloud service Crunchy Data Your partner in deploying open source PostgreSQL throughout your enterprise.
  • 4. CPSM Provider Plugin This talk introduces many different tools and techniques available in PostgreSQL for building applications. It introduces different features and where to find out more information. We have a lot of material to cover in a short time - the slides and demonstrations will be made available How to Approach This Talk
  • 5. CPSM Provider Plugin Imagine we are managing virtual rooms for an event platform. We have a set of operating hours in which the rooms can be booked. Only one booking can occur in a virtual room at a given time. The Problem
  • 7. CPSM Provider Plugin We need to know... - All the rooms that are available to book - When the rooms are available to be booked (operating hours) - When the rooms have been booked And... The system needs to be able to CRUD fast (Create, Read, Update, Delete. Fast). Specifications
  • 10. CPSM Provider Plugin Availability can be thought about in three ways: Closed Available Unavailable (or "booked") Our ultimate "calendar tuple" is (room, status, range) Managing Availability
  • 11. CPSM Provider Plugin PostgreSQL 9.2 introduced "range types" that included the ability to store and efficiently search over ranges of data. Built-in: Date, Timestamps Integer, Numeric Lookups (e.g. overlaps) can be sped up using GiST indexes Postgres Range Types SELECT tstzrange('2021-10-28 09:30'::timestamptz, '2021-10-28 10:30'::timestamptz);
  • 13. Availability SELECT * FROM ( VALUES ('closed', tstzrange('2021-10-28 0:00', '2021-10-28 8:00')), ('available', tstzrange('2021-10-28 08:00', '2021-10-28 09:30')), ('unavailable', tstzrange('2021-10-28 09:30', '2021-10-28 10:30')), ('available', tstzrange('2021-10-28 10:30', '2021-10-28 16:30')), ('unavailable', tstzrange('2021-10-28 16:30', '2021-10-28 18:30')), ('available', tstzrange('2021-10-28 18:30', '2021-10-28 20:00')), ('closed', tstzrange('2021-10-28 20:00', '2021-10-29 0:00')) ) x(status, calendar_range) ORDER BY lower(x.calendar_range);
  • 15. CPSM Provider Plugin Insert new ranges and dividing them up PostgreSQL did not work well with noncontiguous ranges…until PostgreSQL 14 Availability Just for one day - what about other days? What happens with data in the past? What happens with data in the future? Unavailability Ensure no double-bookings Overlapping Events? Handling multiple spaces But…
  • 16. Managing Availability availability_rule id <serial> PRIMARY KEY room_id <int> REFERENCES (room) days_of_week <int[]> start_time <time> end_time <time> generate_weeks_into_future <int> DEFAULT 52 room id <serial> PRIMARY KEY name <text> availability id <serial> PRIMARY KEY room_id <int> REFERENCES (room) availability_rule_id <int> REFERENCES (availabilityrule) available_date <date> available_range <tstzrange> unavailability id <serial> PRIMARY KEY room_id <int> REFERENCES (room) unavailable_date <date> unavailable_range <tstzrange> calendar id <serial> PRIMARY KEY room_id <int> REFERENCES (room) status <text> DOMAIN: {available, unavailable, closed} calendar_date <date> calendar_range <tstzrange>
  • 17. CPSM Provider Plugin We can now store data, but what about: Generating initial calendar? Generating availability based on rules? Generating unavailability? Sounds like we need to build an application Managing Availability
  • 18. CPSM Provider Plugin To build our application, there are a few topics we will need to explore first: generate_series Recursive queries Ranges and Multiranges SQL Functions Set returning functions PL/pgsql Triggers Managing Availability
  • 19. CPSM Provider Plugin Generate series is a "set returning" function, i.e. a function that can return multiple rows of data. Generate series can return: A set of numbers (int, bigint, numeric) either incremented by 1 or some other integer interval A set of timestamps incremented by a time interval(!!) generate_series: More Than Just For Test Data SELECT x::date FROM generate_series( '2021-01-01'::date, '2021-12-31'::date, '1 day'::interval ) x;
  • 20. CPSM Provider Plugin PostgreSQL 8.4 introduced the "WITH" syntax and with it also introduced the ability to perform recursive queries WITH RECURSIVE ... AS () Base case vs. recursive case UNION vs. UNION ALL CAN HIT INFINITE LOOPS Recursion in SQL?
  • 21. CPSM Provider Plugin Recursion in SQL? WITH RECURSIVE fac AS ( SELECT 1::numeric AS n, 1::numeric AS i UNION SELECT fac.n * (fac.i + 1), fac.i + 1 AS i FROM fac ) SELECT fac.n, fac.i FROM fac; Infinite Recursion
  • 22. CPSM Provider Plugin Recursion in SQL? WITH RECURSIVE fac AS ( SELECT 1::numeric AS n, 1::numeric AS i UNION SELECT fac.n * (fac.i + 1), fac.i + 1 AS i FROM fac ) SELECT fac.n, fac.i FROM fac LIMIT 100;
  • 23. Postgres 14 introduces multirange types Ordered list of ranges Can be noncontiguous Adds range aggregates: range_agg and unnest Multirange Types SELECT datemultirange( daterange(CURRENT_DATE, CURRENT_DATE + 1), daterange(CURRENT_DATE + 5, CURRENT_DATE + 8), daterange(CURRENT_DATE + 15, CURRENT_DATE + 22) );
  • 24. CPSM Provider Plugin PostgreSQL provides the ability to write functions to help encapsulate repeated behavior PostgreSQL 11 introduces stored procedures which enables you to embed transactions! PostgreSQL 14 adds the ability to get output from stored procedures! SQL functions have many properties, including: Input / output Volatility (IMMUTABLE, STABLE, VOLATILE) (default VOLATILE) Parallel safety (default PARALLEL UNSAFE) LEAKPROOF; SECURITY DEFINER Execution Cost Language type (more on this later) Functions
  • 25. CPSM Provider Plugin Functions CREATE OR REPLACE FUNCTION chipug_fac(n int) RETURNS numeric AS $$ WITH RECURSIVE fac AS ( SELECT 1::numeric AS n, 1::numeric AS i UNION SELECT fac.n * (fac.i + 1), fac.i + 1 AS i FROM fac WHERE i + 1 <= $1 ) SELECT max(fac.n) FROM fac; $$ LANGUAGE SQL IMMUTABLE PARALLEL SAFE;
  • 26. CPSM Provider Plugin Functions CREATE OR REPLACE FUNCTION chipug_fac_set(n int) RETURNS SETOF numeric AS $$ WITH RECURSIVE fac AS ( SELECT 1::numeric AS n, 1::numeric AS i UNION SELECT fac.n * (fac.i + 1), fac.i + 1 AS i FROM fac WHERE i + 1 <= $1 ) SELECT fac.n FROM fac ORDER BY fac.n; $$ LANGUAGE SQL IMMUTABLE PARALLEL SAFE;
  • 27. CPSM Provider Plugin Functions CREATE OR REPLACE FUNCTION chipug_fac_table(n int) RETURNS TABLE(n numeric) AS $$ WITH RECURSIVE fac AS ( SELECT 1::numeric AS n, 1::numeric AS i UNION SELECT fac.n * (fac.i + 1), fac.i + 1 AS i FROM fac WHERE i + 1 <= $1 ) SELECT fac.n FROM fac ORDER BY fac.n; $$ LANGUAGE SQL IMMUTABLE PARALLEL SAFE;
  • 28. CPSM Provider Plugin PostgreSQL has the ability to load in procedural languages ("PL") and execute code in them beyond SQL. Built-in: pgSQL, Python, Perl, Tcl Others: Javascript, R, Java, C, JVM, Container, LOLCODE, Ruby, PHP, Lua, pgPSM, Scheme Procedural Languages
  • 29. CPSM Provider Plugin PL/pgSQL CREATE EXTENSION IF NOT EXISTS plpgsql; CREATE OR REPLACE FUNCTION chipug_fac_plpgsql(n int) RETURNS numeric AS $$ DECLARE fac numeric; i int; BEGIN fac := 1; FOR i IN 1..n LOOP fac := fac * i; END LOOP; RETURN fac; END; $$ LANGUAGE plpgsql IMMUTABLE PARALLEL SAFE;
  • 30. CPSM Provider Plugin Triggers are functions that can be called before/after/instead of an operation or event Data changes (INSERT/UPDATE/DELETE) Events (DDL, DCL, etc. changes) Atomic Must return "trigger" or "event_trigger" (Return "NULL" in a trigger if you want to skip operation) (Gotcha: RETURN OLD [INSERT] / RETURN NEW [DELETE]) Execute once per modified row or once per SQL statement Multiple triggers on same event will execute in alphabetical order Writeable in any PL language that defined trigger interface Triggers
  • 32. We'll Scan the Code It's Available for Download 😉
  • 34. CPSM Provider Plugin [Test your live demos before running them, and you will have much success!] availability_rule inserts took some time, > 350ms availability: INSERT 52 calendar: INSERT 52 from nontrivial function Updates on individual availability / unavailability are not too painful Lookups are faaaaaaaast Lessons of the Test
  • 35. How About At (Web) Scale?
  • 36. CPSM Provider Plugin Recursive CTE 😢 Even with only 100 more rooms with a few set of rules, rule generation time increased significantly Multirange Types These are still pretty fast and are handling scaling up well. May still be slow for a web transaction. Lookups are still lightning fast! Web Scale
  • 37. CPSM Provider Plugin Added in PostgreSQL 9.4 Replays all logical changes made to the database Create a logical replication slot in your database Only one receiver can consume changes from one slot at a time Slot keeps track of last change that was read by a receiver If receiver disconnects, slot will ensure database holds changes until receiver reconnects Only changes from tables with primary keys are relayed As of PostgreSQL 10, you can set a "REPLICA IDENTITY" on a UNIQUE, NOT NULL, non-deferrable, non-partial column(s) Basis for Logical Replication Logical Decoding
  • 38. CPSM Provider Plugin A logical replication slot has a name and an output plugin PostgreSQL comes with the "test" output plugin Have to write a custom parser to read changes from test output plugin Several output plugins and libraries available wal2json: https://github.com/eulerto/wal2json jsoncdc: https://github.com/instructure/jsoncdc Debezium: http://debezium.io/ (Test: https://www.postgresql.org/docs/current/static/test-decoding.html) Logical Replication (pgoutput) Every data change in the database is streamed Need to be aware of the logical decoding format Logical Decoding Out of the Box
  • 39. CPSM Provider Plugin C: libpq pg_recvlogical PostgreSQL functions Python: psycopg2 - version 2.7 JDBC: version 42 Go: pgx JavaScript: node-postgres (pg-logical-replication) Driver Support
  • 40. CPSM Provider Plugin Using Logical Decoding
  • 41. CPSM Provider Plugin We know it takes time to regenerate calendar Want to ensure changes always propagate but want to ensure all users (managers, calendar searchers) have good experience Thoughts🤔
  • 42. CPSM Provider Plugin Will use the same data model as before as well as the same helper functions, but without the triggers We will have a Python script that reads from a logical replication slot and if it detects a relevant change, take an action Similar to what we did with triggers, but this moves the work to OUTSIDE the transaction BUT...we can confirm whether or not the work is completed, thus if the program fails, we can restart from last acknowledged transaction ID Replacing Triggers
  • 44. CPSM Provider Plugin A consumer of the logical stream can only read one change at a time If our processing of a change takes a lot of time, it will create a backlog of changes Backlog means the PostgreSQL server needs to retain more WAL logs Retaining too many WAL logs can lead to running out of disk space Running out of disk space can lead to...rough times. The Consumer Bottleneck 🌤 🌥 ☁ 🌩
  • 46. CPSM Provider Plugin Can utilize a durable message queueing system to store any WAL changes that are necessary to perform post-processing on Ensure the changes are worked on in order "Divide-and-conquer" workload - have multiple workers acting on different "topics" Remove WAL bloat Shifting the Workload
  • 47. CPSM Provider Plugin Durable message processing and distribution system Streams Supports parallelization of consumers Multiple consumers, partitions Highly-available, distributed architecture Acknowledgement of receiving, processing messages; can replay (sounds like WAL?) Can also accomplish this with Debezium, which interfaces with Kafka + Postgres Apache Kafka
  • 49. CPSM Provider Plugin WAL Consumer import json, sys from kafka import KafkaProducer from kafka.errors import KafkaError import psycopg2 import psycopg2.extras TABLES = set([ 'availability', 'availability_rule', 'room', 'unavailability', ]) reader = WALConsumer() cursor = reader.connection.cursor() cursor.start_replication(slot_name='schedule', decode=True) try: cursor.consume_stream(reader) except KeyboardInterrupt: print("Stopping reader...") finally: cursor.close() reader.connection.close() print("Exiting reader")
  • 50. CPSM Provider Plugin class WALConsumer(object): def __init__(self): self.connection = psycopg2.connect("dbname=realtime", connection_factory=psycopg2.extras.LogicalReplicationConnection, ) self.producer = producer = KafkaProducer( bootstrap_servers=['localhost:9092'], value_serializer=lambda m: json.dumps(m).encode('ascii'), ) def __call__(self, msg): payload = json.loads(msg.payload, strict=False) print(payload) # determine if the payload should be passed on to a consumer listening # to the Kafka que for data in payload['change']: if data.get('table') in TABLES: self.producer.send(data.get('table'), data) # ensure everything is sent; call flush at this point self.producer.flush() # acknowledge that the change has been read - tells PostgreSQL to stop # holding onto this log file msg.cursor.send_feedback(flush_lsn=msg.data_start)
  • 51. CPSM Provider Plugin Kafka Consumer import json from kafka import KafkaConsumer from kafka.structs import OffsetAndMetadata, TopicPartition import psycopg2 class Worker(object): """Base class to work perform any post processing on changes""" OPERATIONS = set([]) # override with "insert", "update", "delete" def __init__(self, topic): # connect to the PostgreSQL database self.connection = psycopg2.connect("dbname=realtime") # connect to Kafka self.consumer = KafkaConsumer( bootstrap_servers=['localhost:9092'], value_deserializer=lambda m: json.loads(m.decode('utf8')), auto_offset_reset="earliest", group_id='1') # subscribe to the topic(s) self.consumer.subscribe(topic if isinstance(topic, list) else [topic])
  • 52. CPSM Provider Plugin Kafka Consumer def run(self): """Function that runs ad-infinitum""" # loop through the payloads from the consumer # determine if there are any follow-up actions based on the kind of # operation, and if so, act upon it # always commit when done. for msg in self.consumer: print(msg) # load the data from the message data = msg.value # determine if there are any follow-up operations to perform if data['kind'] in self.OPERATIONS: # open up a cursor for interacting with PostgreSQL cursor = self.connection.cursor() # put the parameters in an easy to digest format params = dict(zip(data['columnnames'], data['columnvalues'])) # all the function getattr(self, data['kind'])(cursor, params) # commit any work that has been done, and close the cursor self.connection.commit() cursor.close() # acknowledge the message has been handled tp = TopicPartition(msg.topic, msg.partition) offsets = {tp: OffsetAndMetadata(msg.offset, None)} self.consumer.commit(offsets=offsets)
  • 53. CPSM Provider Plugin Kafka Consumer # override with the appropriate post-processing code def insert(self, cursor, params): """Override with any post-processing to be done on an ``INSERT``""" raise NotImplementedError() def update(self, cursor, params): """Override with any post-processing to be done on an ``UPDATE``""" raise NotImplementedError() def delete(self, cursor, params): """Override with any post-processing to be done on an ``DELETE``""" raise NotImplementedError()
  • 55. CPSM Provider Plugin Logical decoding allows the bulk inserts to occur significantly faster from a transactional view Potential bottleneck for long running execution, but bottlenecks are isolated to specific queues Newer versions of PostgreSQL has features that make it easier to build applications and scale Lessons
  • 56. CPSM Provider Plugin PostgreSQL is robust. Triggers will keep your data in sync but can have significant performance overhead Utilizing a logical replication slot can eliminate trigger overhead and transfer the computational load elsewhere Not a panacea: still need to use good architectural patterns! Conclusion