SlideShare a Scribd company logo
1 of 34
Download to read offline
PostgreSQL System Architecture
Heikki Linnakangas / VMware Ltd
July 4th, 2014
What is Software Architecture?
the fundamental organization of a system embodied in its
components, their relationships to each other, and to the
environment, and the principles guiding its design and
evolution. [IEEE 1471]
Software Architecture document
Not a single diagram
Multiple diagrams presenting different viewpoints
Process View
Development View
Functional View
Architecture vs. Design
Architecture = Design
just at a higher level
The Software Architect decides which decisions are part of
architecture, and which can be left to Design or
Implementation.
This presentation is based on my opinions.
Questions? Feel free to interrupt.
Part 0: Non-functional requirements
Data integrity
Performance
Scalability
Reliability
Interoperability
Portability
Extensibility
Maintainability (code readability)
Example: Extensibility
PostgreSQL has a highly extensible type system.
Behavior of all datatypes is defined by operators and operator
classes:
CREATE OPERATOR name (
PROCEDURE = function_name
[, LEFTARG = left_type ] [, RIGHTARG = right_type ]
[, COMMUTATOR = com_op ] [, NEGATOR = neg_op ]
[, RESTRICT = res_proc ] [, JOIN = join_proc ]
[, HASHES ] [, MERGES ]
)
Including all built-in types
Example: Extensibility vs Performance
Built-in int4 < operator:
Datum
int4lt(PG_FUNCTION_ARGS)
{
int32 arg1 = PG_GETARG_INT32(0);
int32 arg2 = PG_GETARG_INT32(1);
PG_RETURN_BOOL(arg1 < arg2);
}
Example: Extensibility vs Performance
Extensibility makes e.g PostGIS possible.
It comes at a price:
Cannot assume anything about how an operator behaves,
except what’s specified in the CREATE OPERATOR
statement
No exceptions for built-in types.
No special case e.g. for sorting integers
Part 1: Process View
PostgreSQL consists of multiple process that communicate via
shared memory:
Postmaster
One backend process for each connection
Auxiliary processes
autovacuum launcher
autovacuum workers
background writer
startup process (= WAL recovery)
WAL writer
checkpointer
stats collector
logging collector
WAL archiver
custom background workers
Part 1: Process View
~$ ps ax | grep postgres
29794 pts/7 S 0:00 ~/pgsql.master/bin/postgres -D data
29799 ? Ss 0:00 postgres: checkpointer process
29800 ? Ss 0:00 postgres: writer process
29801 ? Ss 0:00 postgres: wal writer process
29802 ? Ss 0:00 postgres: autovacuum launcher proce
29803 ? Ss 0:00 postgres: stats collector process
29826 ? Ss 0:00 postgres: heikki postgres [local] i
Postmaster
Listens on the TCP socket (port 5432) for connections
forks new backends as needed
Parent of all other processes
listens for unexpected death of child processes
initiates start / stop of the system
Postmaster
Reliability is very important:
Does not touch shared memory
except a little bit
No locking between postmaster and other processes
Never blocks waiting on the client
Backends
One regular backend per connection. Communicates with other
server processes via shared memory
1. Postmaster launches a new backend process for each incoming
client connection.
2. Backend authenticates the client
3. Attaches to the specified database
4. Execute queries.
. . . when the client disconnects:
5. Detaches from shared memory
6. exit
Startup process
Launched once at system startup
Reads the pg control file and determines if recovery is needed
Performs WAL recovery
Auxiliary processes
Background writer:
Scans the buffer pool and writes out dirty buffers
WAL writer:
Writes out dirty WAL buffers
The system will function OK without these.
More auxiliary processes
These are not attached to shared memory:
stats collector
logging collector
WAL archiver
Part 2: Shared Memory
PostgreSQL processes use shared memory to communicate.
Fixed size, allocated at startup.
Divided into multiple small areas for different purposes.
>99% of the shared memory is used by the shared buffer pool
(shared buffers).
Most shared memory areas are protected by LWLocks
Shared Memory / Buffer Pool
Consists of a number of 8k buffers.
sized by shared buffers
e.g shared buffers=1GB –> 131072 buffers
Each buffer can hold one page
Buffer can be dirty
Buffer must be “pinned” when accessed, so that it’s not
evicted
Buffer can be locked in read or read/write mode.
Buffer replacement uses Clock algorithm
Shared Memory / Buffer Pool
typedef struct sbufdesc
{
BufferTag tag; /* ID of page contained in
BufFlags flags; /* see bit definitions abov
uint16 usage_count; /* usage counter for clock
unsigned refcount; /* # of backends holding pi
int wait_backend_pid; /* backend PID of p
slock_t buf_hdr_lock; /* protects the above field
int buf_id; /* buffer’s index number (f
int freeNext; /* link in freelist chain *
LWLock *io_in_progress_lock; /* to wait for I/O
LWLock *content_lock; /* to lock access to buffer
} BufferDesc;
Shared Memory / Proc Array
One entry for each backend (PGPROC struct, < 1KB). Sized by
max connections.
database ID connected to
Process ID
Current XID
stuff needed to wait on locks
Acquiring an MVCC snapshot scans the array, collecting the XIDs
of all processes.
Deadlock checker scans the lock information to form a locking
graph.
Shared Memory / Lock Manager
Lock manager handles mostly relation-level locks:
prevents a table from begin DROPped while it’s being used
hash table, sized by max locks per transaction *
max connections
deadlock detection
use “SELECT * FROM pg locks” to view
Aka known as heavy-weight locks. Data structures in shared
memory are protected by lightweight locks and spinlocks.
Shared Memory / Other stuff
Communication between backends and aux processes:
AutoVacuum Data
Checkpointer Data
Background Worker Data
Wal Receiver Ctl
Wal Sender Ctl
pgstat
Backend Status Array
Backend Application Name Buffer
Backend Client Host Name Buffer
Backend Activity Buffer
Shared Memory / Other stuff
Other caches (aka. SLRUs):
pg xlog
pg clog
pg subtrans
pg multixact
pg notify
Misc:
Prepared Transaction Table
Sync Scan Locations List
BTree Vacuum State
Serializable transactions stuff
Shared Memory / Other stuff
Communication with postmaster:
PMSignalState
Shared Cache Invalidation:
shmInvalBuffer
Locking
1. Lock manager (heavy-weight locks)
deadlock detection
many different lock levels
relation-level
pg locks system view
2. LWLocks (lightweight locks)
shared/exclusive
protects shared memory structures like buffers, proc array
no deadlock detection
3. Spinlocks
Platform-specific assembler code
typically single special CPU instruction
busy-waiting
Shared Memory / What’s not in shared memory
Catalog caches
Plan cache
work mem
Data structures in shared memory are simple, which is good for
robustness.
Part 3: Backend-private memory / Caches
Relcache:
information about tables or indexes
Catalog caches, e.g:
operators
functions
data types
Plan cache:
plans for prepared statements
queries in PL/pgSQL code
When a table/operator/etc. is dropped or altered, a “shared cache
invalidation” event is broadcast to all backends. Upon receiving
the event, the cache entry for the altered object is invalidated.
Backend-private memory
All memory is allocated in MemoryContexts. MemoryContexts
form a hierarchy:
TopMemoryContext (like malloc())
per-transaction context (reset at commit/rollback)
per-query context (reset at end of query)
per-expression context (reset ˜ between every function call)
Most of the time, you don’t need to free small allocations
individually.
Backend-private memory
TopMemoryContext: 86368 total in 12 blocks; 16392 free (37
TopTransactionContext: 8192 total in 1 blocks; 7256 free
TableSpace cache: 8192 total in 1 blocks; 3176 free (0 ch
Type information cache: 24248 total in 2 blocks; 3712 fre
Operator lookup cache: 24576 total in 2 blocks; 11848 fre
MessageContext: 32768 total in 3 blocks; 13512 free (0 ch
Operator class cache: 8192 total in 1 blocks; 1640 free (
smgr relation table: 24576 total in 2 blocks; 9752 free (
TransactionAbortContext: 32768 total in 1 blocks; 32736 f
Portal hash: 8192 total in 1 blocks; 1640 free (0 chunks)
PortalMemory: 8192 total in 1 blocks; 7880 free (0 chunks
PortalHeapMemory: 1024 total in 1 blocks; 784 free (0 c
ExecutorState: 8192 total in 1 blocks; 784 free (0 ch
printtup: 8192 total in 1 blocks; 8160 free (0 chun
ExprContext: 0 total in 0 blocks; 0 free (0 chunks)
Relcache by OID: 24576 total in 2 blocks; 13800 free (2 c
CacheMemoryContext: 516096 total in 6 blocks; 82480 free
Part 3: Error handling / ereport()
ereport(ERROR,
errmsg("relation "%s" in %s clause not found in F
thisrel->relname,
LCS_asString(lc->strength)),
parser_errposition(pstate, thisrel->location
Jumps out of the code being executed, like a C++ exception.
(uses longjmp())
Sends the error to the client
Prints the error to the log
Error handler aborts the (sub)transaction:
per-transaction memory context is reset
locks are released
Part 3: Error handling / FATAL
Typically for serious, unexpected, internal errors:
if (setitimer(ITIMER_REAL, &timeval, NULL) != 0)
elog(FATAL, "could not disable SIGALRM timer: %m");
Also when the client disconnects unexpectedly.
Like ERROR, jumps out of the code being executed, and
sends the message to client and log
Releases locks, detaches from shared memory, and terminates
the process.
The rest of the system continues running.
Part 3: Error handling / PANIC
Serious events that require a restart:
ereport(PANIC,
(errcode_for_file_access(),
errmsg("could not open transaction log file "%s"
Prints the error to the log
Terminates the process immediately with non-zero exit status.
Postmaster sees the unexpected death of the child process,
and sends SIGQUIT signal to all remaining child processes, to
terminate them.
After all the children have exited, postmaster restarts the
system (crash recovery)
Part 3: Backend programming
Hierarchical memory contexts
Error handling
->
Robustness
Thank you
PostgreSQL is easily to extend
PostgreSQL source code is easy to read
Start hacking! Questions?

More Related Content

What's hot

OOUG: Oracle transaction locking
OOUG: Oracle transaction lockingOOUG: Oracle transaction locking
OOUG: Oracle transaction locking
Kyle Hailey
 
UKOUG, Oracle Transaction Locks
UKOUG, Oracle Transaction LocksUKOUG, Oracle Transaction Locks
UKOUG, Oracle Transaction Locks
Kyle Hailey
 
了解Oracle rac brain split resolution
了解Oracle rac brain split resolution了解Oracle rac brain split resolution
了解Oracle rac brain split resolution
maclean liu
 
glance replicator
glance replicatorglance replicator
glance replicator
irix_jp
 

What's hot (20)

InnoDB MVCC Architecture (by 권건우)
InnoDB MVCC Architecture (by 권건우)InnoDB MVCC Architecture (by 권건우)
InnoDB MVCC Architecture (by 권건우)
 
OOUG: Oracle transaction locking
OOUG: Oracle transaction lockingOOUG: Oracle transaction locking
OOUG: Oracle transaction locking
 
Performance schema 설정
Performance schema 설정Performance schema 설정
Performance schema 설정
 
UKOUG, Oracle Transaction Locks
UKOUG, Oracle Transaction LocksUKOUG, Oracle Transaction Locks
UKOUG, Oracle Transaction Locks
 
The first bug on Oracle Database 12c: how to create a pdb by cloning a remote...
The first bug on Oracle Database 12c: how to create a pdb by cloning a remote...The first bug on Oracle Database 12c: how to create a pdb by cloning a remote...
The first bug on Oracle Database 12c: how to create a pdb by cloning a remote...
 
Monetdb basic bat operation
Monetdb basic bat operationMonetdb basic bat operation
Monetdb basic bat operation
 
Non-Relational Postgres / Bruce Momjian (EnterpriseDB)
Non-Relational Postgres / Bruce Momjian (EnterpriseDB)Non-Relational Postgres / Bruce Momjian (EnterpriseDB)
Non-Relational Postgres / Bruce Momjian (EnterpriseDB)
 
MySQL Tokudb engine benchmark
MySQL Tokudb engine benchmarkMySQL Tokudb engine benchmark
MySQL Tokudb engine benchmark
 
HandlerSocket - A NoSQL plugin for MySQL
HandlerSocket - A NoSQL plugin for MySQLHandlerSocket - A NoSQL plugin for MySQL
HandlerSocket - A NoSQL plugin for MySQL
 
Mutexes 2
Mutexes 2Mutexes 2
Mutexes 2
 
OakTable World Sep14 clonedb
OakTable World Sep14 clonedb OakTable World Sep14 clonedb
OakTable World Sep14 clonedb
 
了解Oracle rac brain split resolution
了解Oracle rac brain split resolution了解Oracle rac brain split resolution
了解Oracle rac brain split resolution
 
Threads Advance in System Administration with Linux
Threads Advance in System Administration with LinuxThreads Advance in System Administration with Linux
Threads Advance in System Administration with Linux
 
PGDay.Amsterdam 2018 - Stefan Fercot - Save your data with pgBackRest
PGDay.Amsterdam 2018 - Stefan Fercot - Save your data with pgBackRestPGDay.Amsterdam 2018 - Stefan Fercot - Save your data with pgBackRest
PGDay.Amsterdam 2018 - Stefan Fercot - Save your data with pgBackRest
 
OpenWorld Sep14 12c for_developers
OpenWorld Sep14 12c for_developersOpenWorld Sep14 12c for_developers
OpenWorld Sep14 12c for_developers
 
Percona Live 2017 ­- Sharded cluster tutorial
Percona Live 2017 ­- Sharded cluster tutorialPercona Live 2017 ­- Sharded cluster tutorial
Percona Live 2017 ­- Sharded cluster tutorial
 
Linux Kernel - Virtual File System
Linux Kernel - Virtual File SystemLinux Kernel - Virtual File System
Linux Kernel - Virtual File System
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
glance replicator
glance replicatorglance replicator
glance replicator
 
Vertica trace
Vertica traceVertica trace
Vertica trace
 

Similar to PG Day'14 Russia, PostgreSQL System Architecture, Heikki Linnakangas

Concurrency Learning From Jdk Source
Concurrency Learning From Jdk SourceConcurrency Learning From Jdk Source
Concurrency Learning From Jdk Source
Kaniska Mandal
 
ECECS 472572 Final Exam ProjectRemember to check the errat.docx
ECECS 472572 Final Exam ProjectRemember to check the errat.docxECECS 472572 Final Exam ProjectRemember to check the errat.docx
ECECS 472572 Final Exam ProjectRemember to check the errat.docx
tidwellveronique
 
ECECS 472572 Final Exam ProjectRemember to check the err.docx
ECECS 472572 Final Exam ProjectRemember to check the err.docxECECS 472572 Final Exam ProjectRemember to check the err.docx
ECECS 472572 Final Exam ProjectRemember to check the err.docx
tidwellveronique
 
ECECS 472572 Final Exam ProjectRemember to check the errata
ECECS 472572 Final Exam ProjectRemember to check the errata ECECS 472572 Final Exam ProjectRemember to check the errata
ECECS 472572 Final Exam ProjectRemember to check the errata
EvonCanales257
 

Similar to PG Day'14 Russia, PostgreSQL System Architecture, Heikki Linnakangas (20)

Concurrency Learning From Jdk Source
Concurrency Learning From Jdk SourceConcurrency Learning From Jdk Source
Concurrency Learning From Jdk Source
 
RTOS implementation
RTOS implementationRTOS implementation
RTOS implementation
 
Event Processing and Integration with IAS Data Processors
Event Processing and Integration with IAS Data ProcessorsEvent Processing and Integration with IAS Data Processors
Event Processing and Integration with IAS Data Processors
 
Informix Data Streaming Overview
Informix Data Streaming OverviewInformix Data Streaming Overview
Informix Data Streaming Overview
 
Computer Science Homework Help
Computer Science Homework HelpComputer Science Homework Help
Computer Science Homework Help
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
 
Sql server performance tuning and optimization
Sql server performance tuning and optimizationSql server performance tuning and optimization
Sql server performance tuning and optimization
 
Chapter 8 : Memory
Chapter 8 : MemoryChapter 8 : Memory
Chapter 8 : Memory
 
Lecture 5 process concept
Lecture 5   process conceptLecture 5   process concept
Lecture 5 process concept
 
Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applications
 
StormCrawler at Bristech
StormCrawler at BristechStormCrawler at Bristech
StormCrawler at Bristech
 
Gg steps
Gg stepsGg steps
Gg steps
 
Bab 4
Bab 4Bab 4
Bab 4
 
ECECS 472572 Final Exam ProjectRemember to check the errat.docx
ECECS 472572 Final Exam ProjectRemember to check the errat.docxECECS 472572 Final Exam ProjectRemember to check the errat.docx
ECECS 472572 Final Exam ProjectRemember to check the errat.docx
 
ECECS 472572 Final Exam ProjectRemember to check the err.docx
ECECS 472572 Final Exam ProjectRemember to check the err.docxECECS 472572 Final Exam ProjectRemember to check the err.docx
ECECS 472572 Final Exam ProjectRemember to check the err.docx
 
Various type of register
Various type of registerVarious type of register
Various type of register
 
Section08 memory management
Section08 memory managementSection08 memory management
Section08 memory management
 
ECECS 472572 Final Exam ProjectRemember to check the errata
ECECS 472572 Final Exam ProjectRemember to check the errata ECECS 472572 Final Exam ProjectRemember to check the errata
ECECS 472572 Final Exam ProjectRemember to check the errata
 
Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek
 
Docker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic StackDocker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic Stack
 

More from pgdayrussia

PG Day'14 Russia, Работа со слабо-структурированными данными в PostgreSQL, Ол...
PG Day'14 Russia, Работа со слабо-структурированными данными в PostgreSQL, Ол...PG Day'14 Russia, Работа со слабо-структурированными данными в PostgreSQL, Ол...
PG Day'14 Russia, Работа со слабо-структурированными данными в PostgreSQL, Ол...
pgdayrussia
 
PG Day'14 Russia, Социальная сеть, которая просто работает, Владислав Коваль
PG Day'14 Russia, Социальная сеть, которая просто работает, Владислав КовальPG Day'14 Russia, Социальная сеть, которая просто работает, Владислав Коваль
PG Day'14 Russia, Социальная сеть, которая просто работает, Владислав Коваль
pgdayrussia
 
PG Day'14 Russia, PostgreSQL в avito.ru, Михаил Тюрин
PG Day'14 Russia, PostgreSQL в avito.ru, Михаил ТюринPG Day'14 Russia, PostgreSQL в avito.ru, Михаил Тюрин
PG Day'14 Russia, PostgreSQL в avito.ru, Михаил Тюрин
pgdayrussia
 
PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 3...
PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 3...PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 3...
PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 3...
pgdayrussia
 
PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 2...
PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 2...PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 2...
PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 2...
pgdayrussia
 
PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 1...
PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 1...PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 1...
PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 1...
pgdayrussia
 
PG Day'14 Russia, PostgreSQL: архитектура, настройка и оптимизация, Илья Косм...
PG Day'14 Russia, PostgreSQL: архитектура, настройка и оптимизация, Илья Косм...PG Day'14 Russia, PostgreSQL: архитектура, настройка и оптимизация, Илья Косм...
PG Day'14 Russia, PostgreSQL: архитектура, настройка и оптимизация, Илья Косм...
pgdayrussia
 
PG Day'14 Russia, Индексный поиск по регулярным выражениям, Александр Коротков
PG Day'14 Russia, Индексный поиск по регулярным выражениям, Александр КоротковPG Day'14 Russia, Индексный поиск по регулярным выражениям, Александр Коротков
PG Day'14 Russia, Индексный поиск по регулярным выражениям, Александр Коротков
pgdayrussia
 
PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...
PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...
PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...
pgdayrussia
 
PG Day'14 Russia, Нетрадиционный PostgreSQL: хранение бинарных данных в БД, А...
PG Day'14 Russia, Нетрадиционный PostgreSQL: хранение бинарных данных в БД, А...PG Day'14 Russia, Нетрадиционный PostgreSQL: хранение бинарных данных в БД, А...
PG Day'14 Russia, Нетрадиционный PostgreSQL: хранение бинарных данных в БД, А...
pgdayrussia
 
PG Day'14 Russia, Модуль anyarray, Федор Сигаев
PG Day'14 Russia, Модуль anyarray, Федор СигаевPG Day'14 Russia, Модуль anyarray, Федор Сигаев
PG Day'14 Russia, Модуль anyarray, Федор Сигаев
pgdayrussia
 

More from pgdayrussia (12)

PG Day'14 Russia, Secure PostgreSQL Deployment, Magnus Hagander
PG Day'14 Russia, Secure PostgreSQL Deployment, Magnus HaganderPG Day'14 Russia, Secure PostgreSQL Deployment, Magnus Hagander
PG Day'14 Russia, Secure PostgreSQL Deployment, Magnus Hagander
 
PG Day'14 Russia, Работа со слабо-структурированными данными в PostgreSQL, Ол...
PG Day'14 Russia, Работа со слабо-структурированными данными в PostgreSQL, Ол...PG Day'14 Russia, Работа со слабо-структурированными данными в PostgreSQL, Ол...
PG Day'14 Russia, Работа со слабо-структурированными данными в PostgreSQL, Ол...
 
PG Day'14 Russia, Социальная сеть, которая просто работает, Владислав Коваль
PG Day'14 Russia, Социальная сеть, которая просто работает, Владислав КовальPG Day'14 Russia, Социальная сеть, которая просто работает, Владислав Коваль
PG Day'14 Russia, Социальная сеть, которая просто работает, Владислав Коваль
 
PG Day'14 Russia, PostgreSQL в avito.ru, Михаил Тюрин
PG Day'14 Russia, PostgreSQL в avito.ru, Михаил ТюринPG Day'14 Russia, PostgreSQL в avito.ru, Михаил Тюрин
PG Day'14 Russia, PostgreSQL в avito.ru, Михаил Тюрин
 
PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 3...
PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 3...PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 3...
PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 3...
 
PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 2...
PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 2...PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 2...
PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 2...
 
PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 1...
PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 1...PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 1...
PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 1...
 
PG Day'14 Russia, PostgreSQL: архитектура, настройка и оптимизация, Илья Косм...
PG Day'14 Russia, PostgreSQL: архитектура, настройка и оптимизация, Илья Косм...PG Day'14 Russia, PostgreSQL: архитектура, настройка и оптимизация, Илья Косм...
PG Day'14 Russia, PostgreSQL: архитектура, настройка и оптимизация, Илья Косм...
 
PG Day'14 Russia, Индексный поиск по регулярным выражениям, Александр Коротков
PG Day'14 Russia, Индексный поиск по регулярным выражениям, Александр КоротковPG Day'14 Russia, Индексный поиск по регулярным выражениям, Александр Коротков
PG Day'14 Russia, Индексный поиск по регулярным выражениям, Александр Коротков
 
PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...
PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...
PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...
 
PG Day'14 Russia, Нетрадиционный PostgreSQL: хранение бинарных данных в БД, А...
PG Day'14 Russia, Нетрадиционный PostgreSQL: хранение бинарных данных в БД, А...PG Day'14 Russia, Нетрадиционный PostgreSQL: хранение бинарных данных в БД, А...
PG Day'14 Russia, Нетрадиционный PostgreSQL: хранение бинарных данных в БД, А...
 
PG Day'14 Russia, Модуль anyarray, Федор Сигаев
PG Day'14 Russia, Модуль anyarray, Федор СигаевPG Day'14 Russia, Модуль anyarray, Федор Сигаев
PG Day'14 Russia, Модуль anyarray, Федор Сигаев
 

Recently uploaded

Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
masabamasaba
 

Recently uploaded (20)

%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 

PG Day'14 Russia, PostgreSQL System Architecture, Heikki Linnakangas

  • 1. PostgreSQL System Architecture Heikki Linnakangas / VMware Ltd July 4th, 2014
  • 2. What is Software Architecture? the fundamental organization of a system embodied in its components, their relationships to each other, and to the environment, and the principles guiding its design and evolution. [IEEE 1471]
  • 3. Software Architecture document Not a single diagram Multiple diagrams presenting different viewpoints Process View Development View Functional View
  • 4. Architecture vs. Design Architecture = Design just at a higher level The Software Architect decides which decisions are part of architecture, and which can be left to Design or Implementation. This presentation is based on my opinions. Questions? Feel free to interrupt.
  • 5. Part 0: Non-functional requirements Data integrity Performance Scalability Reliability Interoperability Portability Extensibility Maintainability (code readability)
  • 6. Example: Extensibility PostgreSQL has a highly extensible type system. Behavior of all datatypes is defined by operators and operator classes: CREATE OPERATOR name ( PROCEDURE = function_name [, LEFTARG = left_type ] [, RIGHTARG = right_type ] [, COMMUTATOR = com_op ] [, NEGATOR = neg_op ] [, RESTRICT = res_proc ] [, JOIN = join_proc ] [, HASHES ] [, MERGES ] ) Including all built-in types
  • 7. Example: Extensibility vs Performance Built-in int4 < operator: Datum int4lt(PG_FUNCTION_ARGS) { int32 arg1 = PG_GETARG_INT32(0); int32 arg2 = PG_GETARG_INT32(1); PG_RETURN_BOOL(arg1 < arg2); }
  • 8. Example: Extensibility vs Performance Extensibility makes e.g PostGIS possible. It comes at a price: Cannot assume anything about how an operator behaves, except what’s specified in the CREATE OPERATOR statement No exceptions for built-in types. No special case e.g. for sorting integers
  • 9. Part 1: Process View PostgreSQL consists of multiple process that communicate via shared memory: Postmaster One backend process for each connection Auxiliary processes autovacuum launcher autovacuum workers background writer startup process (= WAL recovery) WAL writer checkpointer stats collector logging collector WAL archiver custom background workers
  • 10. Part 1: Process View ~$ ps ax | grep postgres 29794 pts/7 S 0:00 ~/pgsql.master/bin/postgres -D data 29799 ? Ss 0:00 postgres: checkpointer process 29800 ? Ss 0:00 postgres: writer process 29801 ? Ss 0:00 postgres: wal writer process 29802 ? Ss 0:00 postgres: autovacuum launcher proce 29803 ? Ss 0:00 postgres: stats collector process 29826 ? Ss 0:00 postgres: heikki postgres [local] i
  • 11. Postmaster Listens on the TCP socket (port 5432) for connections forks new backends as needed Parent of all other processes listens for unexpected death of child processes initiates start / stop of the system
  • 12. Postmaster Reliability is very important: Does not touch shared memory except a little bit No locking between postmaster and other processes Never blocks waiting on the client
  • 13. Backends One regular backend per connection. Communicates with other server processes via shared memory 1. Postmaster launches a new backend process for each incoming client connection. 2. Backend authenticates the client 3. Attaches to the specified database 4. Execute queries. . . . when the client disconnects: 5. Detaches from shared memory 6. exit
  • 14. Startup process Launched once at system startup Reads the pg control file and determines if recovery is needed Performs WAL recovery
  • 15. Auxiliary processes Background writer: Scans the buffer pool and writes out dirty buffers WAL writer: Writes out dirty WAL buffers The system will function OK without these.
  • 16. More auxiliary processes These are not attached to shared memory: stats collector logging collector WAL archiver
  • 17. Part 2: Shared Memory PostgreSQL processes use shared memory to communicate. Fixed size, allocated at startup. Divided into multiple small areas for different purposes. >99% of the shared memory is used by the shared buffer pool (shared buffers). Most shared memory areas are protected by LWLocks
  • 18. Shared Memory / Buffer Pool Consists of a number of 8k buffers. sized by shared buffers e.g shared buffers=1GB –> 131072 buffers Each buffer can hold one page Buffer can be dirty Buffer must be “pinned” when accessed, so that it’s not evicted Buffer can be locked in read or read/write mode. Buffer replacement uses Clock algorithm
  • 19. Shared Memory / Buffer Pool typedef struct sbufdesc { BufferTag tag; /* ID of page contained in BufFlags flags; /* see bit definitions abov uint16 usage_count; /* usage counter for clock unsigned refcount; /* # of backends holding pi int wait_backend_pid; /* backend PID of p slock_t buf_hdr_lock; /* protects the above field int buf_id; /* buffer’s index number (f int freeNext; /* link in freelist chain * LWLock *io_in_progress_lock; /* to wait for I/O LWLock *content_lock; /* to lock access to buffer } BufferDesc;
  • 20. Shared Memory / Proc Array One entry for each backend (PGPROC struct, < 1KB). Sized by max connections. database ID connected to Process ID Current XID stuff needed to wait on locks Acquiring an MVCC snapshot scans the array, collecting the XIDs of all processes. Deadlock checker scans the lock information to form a locking graph.
  • 21. Shared Memory / Lock Manager Lock manager handles mostly relation-level locks: prevents a table from begin DROPped while it’s being used hash table, sized by max locks per transaction * max connections deadlock detection use “SELECT * FROM pg locks” to view Aka known as heavy-weight locks. Data structures in shared memory are protected by lightweight locks and spinlocks.
  • 22. Shared Memory / Other stuff Communication between backends and aux processes: AutoVacuum Data Checkpointer Data Background Worker Data Wal Receiver Ctl Wal Sender Ctl pgstat Backend Status Array Backend Application Name Buffer Backend Client Host Name Buffer Backend Activity Buffer
  • 23. Shared Memory / Other stuff Other caches (aka. SLRUs): pg xlog pg clog pg subtrans pg multixact pg notify Misc: Prepared Transaction Table Sync Scan Locations List BTree Vacuum State Serializable transactions stuff
  • 24. Shared Memory / Other stuff Communication with postmaster: PMSignalState Shared Cache Invalidation: shmInvalBuffer
  • 25. Locking 1. Lock manager (heavy-weight locks) deadlock detection many different lock levels relation-level pg locks system view 2. LWLocks (lightweight locks) shared/exclusive protects shared memory structures like buffers, proc array no deadlock detection 3. Spinlocks Platform-specific assembler code typically single special CPU instruction busy-waiting
  • 26. Shared Memory / What’s not in shared memory Catalog caches Plan cache work mem Data structures in shared memory are simple, which is good for robustness.
  • 27. Part 3: Backend-private memory / Caches Relcache: information about tables or indexes Catalog caches, e.g: operators functions data types Plan cache: plans for prepared statements queries in PL/pgSQL code When a table/operator/etc. is dropped or altered, a “shared cache invalidation” event is broadcast to all backends. Upon receiving the event, the cache entry for the altered object is invalidated.
  • 28. Backend-private memory All memory is allocated in MemoryContexts. MemoryContexts form a hierarchy: TopMemoryContext (like malloc()) per-transaction context (reset at commit/rollback) per-query context (reset at end of query) per-expression context (reset ˜ between every function call) Most of the time, you don’t need to free small allocations individually.
  • 29. Backend-private memory TopMemoryContext: 86368 total in 12 blocks; 16392 free (37 TopTransactionContext: 8192 total in 1 blocks; 7256 free TableSpace cache: 8192 total in 1 blocks; 3176 free (0 ch Type information cache: 24248 total in 2 blocks; 3712 fre Operator lookup cache: 24576 total in 2 blocks; 11848 fre MessageContext: 32768 total in 3 blocks; 13512 free (0 ch Operator class cache: 8192 total in 1 blocks; 1640 free ( smgr relation table: 24576 total in 2 blocks; 9752 free ( TransactionAbortContext: 32768 total in 1 blocks; 32736 f Portal hash: 8192 total in 1 blocks; 1640 free (0 chunks) PortalMemory: 8192 total in 1 blocks; 7880 free (0 chunks PortalHeapMemory: 1024 total in 1 blocks; 784 free (0 c ExecutorState: 8192 total in 1 blocks; 784 free (0 ch printtup: 8192 total in 1 blocks; 8160 free (0 chun ExprContext: 0 total in 0 blocks; 0 free (0 chunks) Relcache by OID: 24576 total in 2 blocks; 13800 free (2 c CacheMemoryContext: 516096 total in 6 blocks; 82480 free
  • 30. Part 3: Error handling / ereport() ereport(ERROR, errmsg("relation "%s" in %s clause not found in F thisrel->relname, LCS_asString(lc->strength)), parser_errposition(pstate, thisrel->location Jumps out of the code being executed, like a C++ exception. (uses longjmp()) Sends the error to the client Prints the error to the log Error handler aborts the (sub)transaction: per-transaction memory context is reset locks are released
  • 31. Part 3: Error handling / FATAL Typically for serious, unexpected, internal errors: if (setitimer(ITIMER_REAL, &timeval, NULL) != 0) elog(FATAL, "could not disable SIGALRM timer: %m"); Also when the client disconnects unexpectedly. Like ERROR, jumps out of the code being executed, and sends the message to client and log Releases locks, detaches from shared memory, and terminates the process. The rest of the system continues running.
  • 32. Part 3: Error handling / PANIC Serious events that require a restart: ereport(PANIC, (errcode_for_file_access(), errmsg("could not open transaction log file "%s" Prints the error to the log Terminates the process immediately with non-zero exit status. Postmaster sees the unexpected death of the child process, and sends SIGQUIT signal to all remaining child processes, to terminate them. After all the children have exited, postmaster restarts the system (crash recovery)
  • 33. Part 3: Backend programming Hierarchical memory contexts Error handling -> Robustness
  • 34. Thank you PostgreSQL is easily to extend PostgreSQL source code is easy to read Start hacking! Questions?