This talk was delivered on FOSDEM PGDay 2016, on 29th of January. It discusses the options to stream consistent snapshot of the data existing in a database prior to creating a logical replication slot.
3. Introduction
What is Logical Decoding all about?
● A new feature of PostgreSQL since version 9.4.
● Allows streaming database changes in a custom format.
● Requires an Output Plugin to be written (yes, in C).
● Consistent snapshot before all the changes?
14. Problems
● OpenVPN quickly becomes the bottleneck on the laptop
Obvious solution: deploy workers closer to the database.
Docker + Mesosphere DCOS
https://zalando-techmonkeys.github.io/
15. Problems
● Workers quickly run out of memory
The (problematic) code:
cursor.execute("SELECT * FROM mytable")
16. Problems
● Workers quickly run out of memory
The (problematic) code:
cursor.execute("SELECT * FROM mytable")
● Invokes PQexec().
● Async. connection doesn’t help.
● psycopg2 is not designed to stream results.
17. Problems
● Invoke COPY protocol!
Corrected code:
cursor.copy_expert(
"COPY (SELECT * FROM mytable) TO STDOUT",
...)
18. Problems
● Invoke COPY protocol!
Corrected code:
cursor.copy_expert(
"COPY (SELECT * FROM mytable) TO STDOUT",
...)
● Tried 32 MB, then 64 MB per worker: it was not enough...
● One of the values was around 80 MB(!). Not much we can do.
19. More Problems?
● More problems with this code
The correct(?) code:
cursor.copy_expert(
"COPY (SELECT * FROM mytable) TO STDOUT",
...)
20. More Problems?
● More problems with this code
The correct(?) code:
cursor.copy_expert(
"COPY (SELECT * FROM mytable) TO STDOUT",
...)
● SELECT * FROM [ONLY] myschema.mytable
21. NoSQL?
● How about some JSON for comparison?
SELECT row_to_json(x.*) FROM mytable AS x
● Slows down the export 2-3 times.
● Not 100% equivalent to what output plugin emits.
● Have to write a C function for every plugin.
22. What if we would write a generic function...
CREATE FUNCTION pg_logical_slot_stream_relation(
IN slot_name NAME,
IN relnamespace NAME,
IN relname NAME,
IN nochildren BOOL DEFAULT FALSE,
VARIADIC options TEXT[] DEFAULT '{}',
OUT data TEXT
)
RETURNS SETOF TEXT ...
23. The Final Code
cursor.copy_expert(
"COPY (SELECT pg_logical_slot_stream_relation(...)) TO STDOUT",
...)
● Do not use SELECT … FROM pg_logical_slot_… – it caches result in the backend.
● Requires changes to core PostgreSQL.
● Ideally should not require a slot, only a snapshot.
● Slots cannot be used concurrently (yet).
24. At Last: Some Numbers
6 client processes, SSL (no compression), 1Gbit/s network interface
Query Run Time Volume Notes
SELECT * FROM … 7.5 h 2.7 TB 105 MB/s
pglogical / JSON 17.5 h 6.7 TB 112 MB/s
pglogical / native 30+ h (incomplete) 11+ TB 106 MB/s
Bottled Water / Avro 13.5 h 5.0 TB 108 MB/s
25. Space for Improvement
In native protocol format pglogical_output emits metadata per each tuple.
● Metadata overhead: 5.0 TB (167%)
○ nspname + relname + attnames
● Protocol overhead: 1.5 TB (50%)
○ message type + lengths
Set plugin option relmeta_cache_size to -1.
26. ● Network is apparently the bottleneck.
● What if we enable SSL compression?..
A Common Number: ~110 MB/s
27. sslcompression=1?
● Nowadays seems to be really hard to do, re: CRIME vulnerability.
● Older distro versions: set env. OPENSSL_DEFAULT_ZLIB
● Newer distro versions: OpenSSL is compiled without zlib. Good luck!
● TLSv1.3 will remove support for compression.
● HINT: your streaming replication is likely to be running uncompressed.
https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2012-4929
29. Compression FTW!
24 client processes, SSL (with compression)
Query Run Time Volume Notes
SELECT * FROM … 3h (vs. 7.5 h) 2.7 TB
pglogical / JSON 7-8* h (vs. 17.5 h) 6.7 TB *ordering
pglogical / native 8-9* h (vs. 30+ h) 7.2 TB (vs. 11+ TB)
Bottled Water / Avro 10 h (vs. 13.5 h) 5.0 TB
30. In Conclusion
● Set relmeta_cache_size with pglogical_output native.
● Run a benchmark to see if you need compression.
● Order tables from largest to smallest.
● Do listen on the replication slot once the export is finished.
● Help needed: review the proposed streaming interface!