Learn how zheap works

HOW ZHEAP WORKS
REINVENTING POSTGRESQL STORAGE
BY HANS-JÜRGEN SCHÖNIG

ABOUT
ME AND MY
COMPANY
■ Who is the guy?
■ Who is CYBERTEC?

HANS-JÜRGEN
SCHÖNIG
CEO & SENIOR DATABASE CONSULTANT
■ PostgreSQL since 1999
■ author of various database books
M A I L hs@cybertec.at
P H O N E +43 2622 930 22-2
W E B www.cybertec-postgresql.com

DATABASE SERVICES
DATA Science
▪ Artiﬁcial Intelligence
▪ Machine Learning
▪ Big Data
▪ Business Intelligence
▪ Data Mining
▪ etc.
POSTGRESQL Services
▪ 24/7 Support
▪ Training
▪ Consulting
▪ Performance Tuning
▪ Clustering
▪ etc.

▪ ICT
▪ University
▪ Government
▪ Automotive
▪ Industry
▪ Trade
▪ Finance
▪ etc.
CLIENT
SECTORS

AGENDA
■ traditional tables
■ table bloat and VACUUM
■ Why a new storage system?
■ zheap: the goal
■ zheap: basic architecture
■ zheap: transaction slots, etc.
■ performance impacts
■ roadmap

HEAP: STANDARD TABLES
■ Data structure looks as follows:

■ Data structure looks as follows:
HEAP: STANDARD TABLES

HEAP AND TRANSACTIONS
UPDATES AND VISIBILITY

MAIN ISSUE: TABLE BLOAT
test=# CREATE TABLE a (aid int) WITH (autovacuum_enabled = off);
CREATE TABLE
test=# INSERT INTO a SELECT * FROM generate_series(1, 1000000);
INSERT 0 1000000
test=# SELECT pg_size_pretty(pg_relation_size('a'));
pg_size_pretty
----------------
35 MB
(1 row)

test=# UPDATE a SET aid = aid + 1;
UPDATE 1000000
pg_size_pretty
----------------
69 MB
(1 row)

test=# VACUUM VERBOSE a;
INFO: vacuuming "public.a"
INFO: "a": removed 1000000 row versions in 4425 pages
INFO: "a": found 1000000 removable, 1000000 nonremovable row versions in 8850
out of 8850 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 539
...
VACUUM
pg_size_pretty
----------------
69 MB
(1 row)

ONE WORD ABOUT VACUUM
■ VACUUM is not always allowed to
reallocate dead rows
■ A row must be REALLY dead for VACUUM
to do its job
■ Long transactions can be an enemy
→ Once you are in pain it tends not to go away

WAYS OUT
■ VACUUM FULL: Needs a table lock
■ pg_squeeze:
■ Shrinking tables with less locking
■ Move between tablespaces
■ Index organize tables
HINT: Try to avoid bloat in the ﬁrst place!

ZHEAP: DESIGN GOALS
■ Perform UPDATE in place
■ Have smaller tables
■ smaller tuple headers
■ improved alignment
■ Reduce writes as much as possible
■ avoid dirtying pages unless data is modiﬁed
■ normal heaps dirty pages in some cases during reads
■ Reuse space more quickly
■ Get rid of VACUUM

ZHEAP: TUPLE HEADERS
■ Heap: 20+ bytes per row
■ Zheap: 5 bytes per row
How can this be achieved?
■ The tuple header controls “visibility”
■ “Normalize tuple header”
■ Move visibility info to the page level

ZHEAP: TRANSACTION SLOTS
Transaction slots hold transactional visibility

ZHEAP: TRANSACTION SLOTS
Transaction slots:
■ 16 bytes of storage
■ contains the following information
■ transaction id
■ epoch
■ latest undo record pointer of that transaction
What if we need more slots?

ZHEAP: TPD PAGES
■ TPD: Store additional transaction slots if “4” is not enough
■ TPD pages are interleaved with normal pages
■

OPERATION: INSERT
■ Allocate a transaction slot
■ Emit an undo entry to ﬁx things on error
■ Space can be reclaimed instantly after a ROLLBACK
→ Most simplistic operation

OPERATION: UPDATE
■ More complicated:
■ The new row ﬁts into the old space
■ The new row does not ﬁt into the old space

OPERATION: UPDATE FITS
■ If the row is shorter:
■ We can overwrite it
■ Emit undo record
In short: We hold the new row in zheap and a copy of the old row in undo so
that we can copy it back to the old structure in case it is needed.

OPERATION: UPDATE DOESN’T FIT
■ Will be worse
■ DELETE old row
■ INSERT new row in a diﬀerent place
■ Less eﬃcient
Space can instantly be reclaimed in the following cases:
■ When updating a row to a shorter version
■ When non-inplace UPDATEs are performed

OPERATION: DELETE
■ How it works
■ Emit undo record
■ DELETE row from zheap
Old row can be moved back into zheap during ROLLBACK.

ROLLBACK
■ In case a ROLLBACK happens:
■ undo has to make sure that the old state of the table is restored.
■ Old rows have to be copied back
■ ROLLBACK takes longer !
Undo itself can be removed in three cases:
■ as soon as there are no transactions anymore that can see the data.
■ as soon as all undo action has been completed
■ For committed transactions till the time they are all-visible

UNDO WORKERS
■ Discarding the undo logs is performed by discard worker
■ Undo launcher checks the rollback_hash_table periodically
■ Spawn new undo workers to perform the rollback
■ Each spawned undo worker processes the rollback requests for a
particular database.

PREPARING DATA
■ Creating some random data
test=# SET temp_buffers TO '1 GB';
SET
test=# CREATE TEMP TABLE raw AS
SELECT id,
hashtext(id::text) as name,
random() * 10000 AS n, true AS b
FROM generate_series(1, 10000000) AS id;
SELECT 10000000

LOADING A HEAP
■ Populating a normal table
test=# timing
Timing is on.
test=# CREATE TABLE h1 (LIKE raw) USING heap;
CREATE TABLE
Time: 7.836 ms
test=# INSERT INTO h1 SELECT * FROM raw;
INSERT 0 10000000
Time: 7495.798 ms (00:07.496)

LOADING A ZHEAP
■ Mind the runtime
test=# CREATE TABLE z1 (LIKE raw) USING zheap;
CREATE TABLE
Time: 8.045 ms
test=# INSERT INTO z1 SELECT * FROM raw;
INSERT 0 10000000
Time: 27947.516 ms (00:27.948)

ZHEAP IN ACTION
test=# BEGIN;
BEGIN
test=*# SELECT pg_size_pretty(pg_relation_size('z1'));
pg_size_pretty
----------------
251 MB
(1 row)
test=*# UPDATE z1 SET id = id + 1;
UPDATE 10000000
test=*# SELECT pg_size_pretty(pg_relation_size('z1'));
pg_size_pretty
----------------
251 MB
(1 row)

UNDO IN ACTION
[hs@hs-MS-7817 undo]$ pwd
/home/hs/db13/base/undo
[hs@hs-MS-7817 undo]$ ls -l | tail -n 10
-rw-------. 1 hs hs 1048576 Oct 8 12:08 000001.003EC00000
-rw-------. 1 hs hs 1048576 Oct 8 12:08 000001.003ED00000
-rw-------. 1 hs hs 1048576 Oct 8 12:08 000001.003EE00000
-rw-------. 1 hs hs 1048576 Oct 8 12:08 000001.003EF00000
-rw-------. 1 hs hs 1048576 Oct 8 12:08 000001.003F000000
-rw-------. 1 hs hs 1048576 Oct 8 12:08 000001.003F100000
-rw-------. 1 hs hs 1048576 Oct 8 12:08 000001.003F200000
-rw-------. 1 hs hs 1048576 Oct 8 12:08 000001.003F300000
-rw-------. 1 hs hs 1048576 Oct 8 12:08 000001.003F400000
-rw-------. 1 hs hs 1048576 Oct 8 12:08 000001.003F500000

WHAT WE ARE WORKING ON
■ agree on ﬁnal design issues
■ ﬁx bugs in current code
■ large code base
■ not easy to handle
■ preparing a patch to move “undo” to core
■ “undo” is core infrastructure
We hope to bring this into core some day.

QUESTIONS?
Feel free to contact me!
M A I L hs@cybertec.at
P H O N E +43 2622 930 22-2
T W I T T E R @postgresql_007

Learn how zheap works

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (19)

Ähnlich wie Learn how zheap works

Ähnlich wie Learn how zheap works (20)

Mehr von EDB

Mehr von EDB (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Learn how zheap works