Inside Postgres Shared Memory with EnterpriseDB

Inside PostgreSQL Shared Memory
BRUCE MOMJIAN
January, 2015
POSTGRESQL is an open-source, full-featured relational database.
This presentation gives an overview of the shared memory
structures used by Postgres.
Creative Commons Attribution License http://momjian.us/presentations
1 / 30

PostgreSQL the database…
◮ Open Source Object Relational DBMS since 1996
◮ Distributed under the PostgreSQL License
◮ Similar technical heritage as Oracle, SQL Server & DB2
◮ However, a strong adherence to standards (ANSI-SQL 2008)
◮ Highly extensible and adaptable design
◮ Languages, indexing, data types, etc.
◮ E.g. PostGIS, JSONB, SQL/MED
◮ Extensive use throughout the world for applications and
organizations of all types
◮ Bundled into Red Hat Enterprise Linux, Ubuntu, CentOS
and Amazon Linux
Inside PostgreSQL Shared Memory 2 / 30

PostgreSQL the community…
◮ Independent community led by a Core Team of six
◮ Large, active and vibrant community
◮ www.postgresql.org
◮ Downloads, Mailing lists, Documentation
◮ Sponsors sampler:
◮ Google, Red Hat, VMWare, Skype, Salesforce, HP and
EnterpriseDB
◮ http://www.postgresql.org/community/

EnterpriseDB the company…
◮ The worldwide leader of Postgres based products and
services founded in 2004
◮ Customers include 50 of the Fortune 500 and 98 of the
Forbes Global 2000
◮ Enterprise offerings:
◮ PostgreSQL Support, Services and Training
◮ Postgres Plus Advanced Server with Oracle Compatibility
◮ Tools for Monitoring, Replication, HA, Backup & Recovery
Community
◮ Citizenship
◮ Contributor of key features: Materialized Views, JSON, &
more
◮ Nine community members on staff

EnterpriseDB the company…

Outline
1. File storage format
2. Shared memory creation
3. Shared buffers
4. Row value access
5. Locking
6. Other structures

File System /data
Postgres
Postgres
Postgres
/data

File System /data/base
Postgres
Postgres
Postgres
/data
/pg_clog
/pg_multixact
/pg_subtrans
/pg_tblspc
/pg_xlog
/global
/pg_twophase
/base

File System /data/base/db
Postgres
Postgres
Postgres
/data /base /16385 (production)
/1 (template1)
/17982 (devel)
/16821 (test)
/21452 (marketing)

File System /data/base/db/table
Postgres
Postgres
Postgres
/data /base /16385 /24692 (customer)
/27214 (order)
/25932 (product)
/25952 (employee)
/27839 (part)

File System Data Pages
Postgres
Postgres
Postgres
8k 8k 8k 8k
/data /base /16385 /24692

Data Pages
Postgres
Postgres
Postgres
Page Header Item Item Item
Tuple
Tuple Tuple Special
8K
8k 8k 8k 8k
/data /base /16385 /24692

File System Block Tuple
Postgres
Postgres
Postgres
Tuple
Tuple Tuple Special
8K
8k 8k 8k 8k
/data /base /16385 /24692
Tuple

File System Tuple
hoff − length of tuple header
infomask − tuple flags
natts − number of attributes
ctid − tuple id (page / item)
cmax − destruction command id
xmin − creation transaction id
xmax − destruction transaction id
cmin − creation command id
bits − bit map representing NULLs
OID − object id of tuple (optional)
Tuple
Value Value ValueValue Value Value ValueHeader
int4in(’9241’)
textout()
’Martin’

Tuple Header C Structures
typedef struct HeapTupleFields
{
TransactionId t_xmin; /* inserting xact ID */
TransactionId t_xmax; /* deleting or locking xact ID */
union
{
CommandId t_cid; /* inserting or deleting command ID, or both */
TransactionId t_xvac; /* VACUUM FULL xact ID */
} t_field3;
} HeapTupleFields;
typedef struct HeapTupleHeaderData
{
union
{
HeapTupleFields t_heap;
DatumTupleFields t_datum;
} t_choice;
ItemPointerData t_ctid; /* current TID of this or newer tuple */
/* Fields below here must match MinimalTupleData! */
uint16 t_infomask2; /* number of attributes + various flags */
uint16 t_infomask; /* various flag bits, see below */
uint8 t_hoff; /* sizeof header incl. bitmap, padding */
/* ^ − 23 bytes − ^ */
bits8 t_bits[1]; /* bitmap of NULLs −− VARIABLE LENGTH */
/* MORE DATA FOLLOWS AT END OF STRUCT */
} HeapTupleHeaderData;

Shared Memory Creation
postmaster postgres postgres
Program (Text)
Data
Program (Text)
Data
Shared Memory
Program (Text)
Data
Shared Memory Shared Memory
Stack Stack
fork()
Stack

Shared Memory
Shared Buffers
Proc Array
PROC
Multi−XACT Buffers
Two−Phase Structs
Subtrans Buffers
CLOG Buffers
XLOG Buffers
Shared Invalidation
Lightweight Locks
Lock Hashes
Auto Vacuum
Btree Vacuum
Buffer Descriptors
Background Writer Synchronized Scan
Semaphores
Statistics
LOCK
PROCLOCK

Shared Buffers
Tuple
Tuple Tuple Special
8K
Postgres
Postgres
Postgres
8k 8k 8k 8k
/data /base /16385 /24692
Shared Buffers
LWLock − for page changes
Pin Count − prevent page replacement
read()
write()
8k 8k 8k
Buffer Descriptors

HeapTuples
Shared Buffers
Postgres
HeapTuple
C pointer
hoff − length of tuple header
infomask − tuple flags
natts − number of attributes
ctid − tuple id (page / item)
cmax − destruction command id
xmin − creation transaction id
xmax − destruction transaction id
cmin − creation command id
bits − bit map representing NULLs
OID − object id of tuple (optional)
Tuple
Value Value ValueValue Value Value ValueHeader
int4in(’9241’)
textout()
’Martin’
Tuple
Tuple Tuple Special
8K
8k 8k 8k

Finding A Tuple Value in C
Datum
nocachegetattr(HeapTuple tuple,
int attnum,
TupleDesc tupleDesc,
bool *isnull)
{
HeapTupleHeader tup = tuple−>t_data;
Form_pg_attribute *att = tupleDesc−>attrs;
{
int i;
/*
* Note − This loop is a little tricky. For each non−null attribute,
* we have to first account for alignment padding before the attr,
* then advance over the attr based on its length. Nulls have no
* storage and no alignment padding either. We can use/set
* attcacheoff until we reach either a null or a var−width attribute.
*/
off = 0;
for (i = 0;; i++) /* loop exit is at "break" */
{
if (HeapTupleHasNulls(tuple) && att_isnull(i, bp))
continue; /* this cannot be the target att */
if (att[i]−>attlen == −1)
off = att_align_pointer(off, att[i]−>attalign, −1,
tp + off);
else
/* not varlena, so safe to use att_align_nominal */
off = att_align_nominal(off, att[i]−>attalign);
if (i == attnum)
break;
off = att_addlength_pointer(off, att[i]−>attlen, tp + off);
}
}
return fetchatt(att[attnum], tp + off);
}

Value Access in C
#define fetch_att(T,attbyval,attlen)
(
(attbyval) ?
(
(attlen) == (int) sizeof(int32) ?
Int32GetDatum(*((int32 *)(T)))
:
(
(attlen) == (int) sizeof(int16) ?
Int16GetDatum(*((int16 *)(T)))
:
(
AssertMacro((attlen) == 1),
CharGetDatum(*((char *)(T)))
)
)
)
:
PointerGetDatum((char *) (T))
)

Test And Set Lock
Can Succeed Or Fail
0/1
1
0
Success
Was 0 on exchange
Lock already taken
Was 1 on exchange
Failure
1
1

Test And Set Lock
x86 Assembler
static __inline__ int
tas(volatile slock_t *lock)
{
register slock_t _res = 1;
/*
* Use a non−locking test before asserting the bus lock. Note that the
* extra test appears to be a small loss on some x86 platforms and a small
* win on others; it’s by no means clear that we should keep it.
*/
__asm__ __volatile__(
" cmpb $0,%1 n"
" jne 1f n"
" lock n"
" xchgb %0,%1 n"
"1: n"
: "+q"(_res), "+m"(*lock)
:
: "memory", "cc");
return (int) _res;
}

Spin Lock
Always Succeeds
0/1
1
Success
Was 0 on exchange
Failure
Was 1 on exchange
Lock already taken
Sleep of increasing duration
0 1
1
Spinlocks are designed for short-lived locking operations, like
access to control structures. They are not be used to protect code
that makes kernel calls or other heavy operations.

Light Weight Locks
Shared Buffers
Proc Array
PROC
Two−Phase Structs
Subtrans Buffers
CLOG Buffers
XLOG Buffers
Shared Invalidation
Lightweight Locks
Auto Vacuum
Btree Vacuum
Semaphores
Statistics
LOCK
PROCLOCK
Lock Hashes
Buffer Descriptors
Sleep On Lock
Light weight locks attempt to acquire the lock, and go to sleep on
a semaphore if the lock request fails. Spinlocks control access to
the light weight lock control structure.

Database Object Locks
PROCLOCKPROC LOCK
Lock Hashes

Proc
Proc Array
used usedusedempty empty empty
PROC

Other Shared Memory Structures
Shared Buffers
Proc Array
PROC
Two−Phase Structs
Subtrans Buffers
CLOG Buffers
XLOG Buffers
Shared Invalidation
Lightweight Locks
Lock Hashes
Auto Vacuum
Btree Vacuum
Buffer Descriptors
Semaphores
Statistics
LOCK
PROCLOCK

Additional Resources…
◮ Postgres Downloads:
◮ www.enterprisedb.com/downloads
◮ Product and Services information:
◮ info@enterprisedb.com

Conclusion
http://momjian.us/presentations Pink Floyd: Wish You Were Here

Inside Postgres Shared Memory with EnterpriseDB

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Mehr von EDB

Mehr von EDB (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Inside Postgres Shared Memory with EnterpriseDB