3. What is PostgreSQL in the first place?
- “PostgreSQL is an object-relational database management system (ORDBMS) based on
POSTGRES, Version 4.2, developed at the University of California at Berkeley Computer
Science Department.” -- PostgreSQL Documentation, By postgresql.org.
- “PostgreSQL (pronounced Post-Gres-Q-L), or postgres for short, is an open source
object-relational-database management system.” -- Learning PostgreSQL 11 (Third
Edition), A beginner's guide to building high-performance PostgreSQL database solutions,
By Salahaldin Juba, Andrey Volkov.
4. If You Are Wondering...
- we heard about relational database, how does this differ from that?
- the definitions claimed postgres to be an “Object Relational Database Management
System”, what does this imply?
- or, we know Object Oriented Principles, does PostgreSQL adapt OOP paradigms like an
Object Oriented Language does?
5. Detour - Database
- in simplest words:
- organized collection of valid data where new records can be added or an existing
record can be accessed, modified or removed
6. Detour - DataBase Management System
- can be seen as gatekeeper of database, basically an interface that:
- offers and controls access to database to read, update or remove data from database
- ensures integrity by imposing given constraints
- ensures concurrency and transactions
- enables remote access to database
- ensures data recovery in case of any kind of failure
7. Detour - Relational DBMS
- group of related data can be stored in a tabular form considering:
- each property as a column in that table, attribute is more common term
- every single instance having those properties is a row or tuple
- relation between the properties of that set is also known as schema
- relating two different schema using some common attribute is possible, eg: foreign key
- use when:
- you know your data model right, structured data
- data pattern is fixed
- all of your entities has fixed attribute and it’s not gonna change
- you need immediate acid compliant transaction
8. Detour - Object Relational DBMS
- object, classes, inheritance etc paradigms of OOP are supported in schema, relation, even in
queries
- supports custom data types and nested data types like oop
- even functions or operators can be overloaded to facilitate polymorphism
10. PostgreSQL - Evolution
- evolved from Ingres project of University of California, Berkeley led by Michael
Stonebraker
- that’s why sometimes termed as Post-Ingres
12. Why Use PostgreSQL?
- can support both relational and non-relational data types
- extensive data read/write speed
- multi-versioning concurrency control
- parallel query execution using multiple cores
- non-blocking indexing
- partial indexing available(skipping deleted tuples)
16. Postgres Built-in Applications
- ships with a number of client and server applications
- uses server/client model
- client and server can reside in different hosts and communicate via TCP/IP or linux socket
- can handle multiple concurrent connection from a client
- each connection to a client forks a new process
17. Postgres Client Applications
- frontend application that requests some database action
- psql:
- offers interactive terminal to write queries and get response from postgres
- queries can be added from file or as command line arguments(cla) as well
- pg_config
- can tell different configured parameter for the installed version
- pgbench
- runs benchmark by executing a number of dummy transactions from a number of
dummy clients
18. Postgres Client Applications(continued...)
- clusterdb
- re-clusters previously clustered tables in the specified databases
- createdb
- creates a new database,
- nothing but a wrapper of CREATE DATABASE command
- dropdb
- removes the specified database
- nothing but a wrapper of DROP DATABASE command
19. Postgres Client Applications(continued...)
- createuser
- creates a new user
- just a wrapper of CREATE ROLE command
- dropuser
- removes a new user
- just a wrapper of DROP ROLE command
- vacuumdb(garbage collector and optionally analyzer)
- cleans dead tuples from all(or specified) tables of a database user has permission to
vacuum or generates statistics about the database
- a wrapper of VACUUM command
- full list here
20. Postgres Server Applications(continued...)
- backend application
- postgres
- accepts connection from client applications
- resolves client requests
- manages database files
- initdb
- creates a new pg cluster
21. Postgres Server Applications(continued...)
- pg_ctl
- initializing, starting, stopping, controlling and etc
- pg_upgrade
- upgrading a postgres server instance
- pg_waldump
- generates human readable wal logs
- full list here
24. PostgreSQL Forked Process(continued...)
- follows process per user method
- one client process gets connected to exactly one server process
- the master(postmaster) process spawns a new backend server process each time a new connection
is requested
26. PostgreSQL Forked Process(continued...)
- master process forks other background process at start-up
- walwriter
- manages Write Ahead Log
- any change to data files(table or index) are logged first into wal buffer
- ensures data integrity
- in case of system crash, roll-forward(or REDO) is done using the log records
- checkpointer
- keeps a checkpoint in the wal sequence
- flushes data files to disk from the last checkpoint reflecting the log
27. PostgreSQL Forked Process(continued...)
- background writer
- writes specific dirty(new or modified) buffers
- may increase I/O load significantly as a dirty page may be written only once per checkpoint
wherase bg writer may write this several times
- vacuum writer
- postgres uses pseudo-deletion method
- if deleted or updated, a tuple is not removed from physical storage of that table
- thes obsolete tuples are marked as deleted
28. PostgreSQL Forked Process(continued...)
- vacuum writer reclaims spaces consumed by dead tuples
- also updates the visibility map(_vm)
- if run with ANALYZE, updates pg_statistic catalog which query planner uses to plan for
most effective execution plan
- stats collector
- collects and reports server activity
- counts access to table and index, number of rows of a table, vacuum and analyze stats etc
29. PostgreSQL Forked Process(continued...)
- logical replication launcher
- doesn’t replicate byte by byte like physical(stream) replication
- replicates one database at a time and only committed row changes, not vacuum ones
- works in publisher-subscriber model
- unlike stream replication multi-master is possible
- DDL is not handled and so manual table creation is required at subscriber end
- column name must match, not order or number of column
- can’t stream transactions as they happen and so can add overhead if transaction is big
- server processes communicate with each other via semaphore and shared memory to ensure
data integrity
32. Memory Layout(continued...)
- shared memory
- accessible from all backend processes and user processes connected to database
- shared buffer, WAL buffer, CLog buffer etc
- local memory
- allocated and used by a specific process or subsystem
- vacuum buffer, temp buffer, work memory etc.
33. Memory Layout(continued...)
Shared Buffer
- where data is read or written
- data or blocks residing here is called dirty data or dirty blocks and they are called data files when
permanently written to disk
- shared memory
- can’t be resized unless running postgres server instance is restarted
- config parameter:
- shared_buffers: 128MB by default
34. Memory Layout(continued...)
WAL Buffer
- separate buffer to keep transaction logs
- wal data is first written to wal buffer before being written to wal disk
- shared memory
- usually 1/16th of shared buffer in size
- config parameter:
- wal_buffers: 4MB by default
35. Memory Layout(continued...)
CLog Buffer
- contains transaction metadata
- keeps status of transactions
- can tell if a transaction is committed or not
- shared memory
Lock Space
- all locks are stored here
- shared memory
36. Memory Layout(continued...)
Vacuum Buffer
- local memory: used by auto vacuum worker
- total size is autovacuum_work_mem times autovacuum_max_workers
- config parameter:
- autovacuum_max_workers: 3 by default
- autovacuum_work_mem: minimum 1MB or if set to -1 uses maintenance_work_mem
which is 64MB by default
37. Memory Layout(continued...)
Work Memory
- local memory: used by the executor or query workspaces
- memory to be used when sort(query example: ORDER BY, DISTINCT MERGE JOIN) or
hash(query example: HASH-JOIN, IN) operation is executed
- config parameter:
- work_mem: 4MB by default
38. Memory Layout(continued...)
Maintenance Memory
- local memory
- memory allocated for maintenance operations like: CREATE INDEX, VACUUM, REINDEX, or
while adding FOREIGN KEY
- config parameter:
- maintenance_work_mem: 64MB by default
39. Memory Layout(continued...)
Temp Buffer
- local memory: used by the executor
- space where temporary typed tables will be stored
- config parameter:
- temp_buffers: 8MB for each session by default
42. Query Execution Phases
- client gets a connection to transmit a query to the server and to receive the results
- parser stage checks the query for correct syntax and creates a query tree
- traffic cop subsystem determines the query type
- utility query is passed to the utilities subsystem
- rewrite takes the query tree and looks for any rules to apply to the query tree
- planner/optimizer takes the (rewritten) query tree and creates a query plan
- first creates all possible paths leading to the same result
- next the cost for the execution of each path is estimated
- finally the cheapest path is chosen
- executor recursively steps through the plan tree and retrieves rows in the way represented
by the plan
45. Logical Layout(continued...)
database cluster
- collection of databases within the running postgres instance
- mainly resides in data area(eg: $PGDATA - /usr/local/pgsql/data)
- multiple clusters managed by different postgres instances can exist on the same machine
- don’t mix it up with physical database server or node cluster
46. Logical Layout(continued...)
database object:
- a data structure used to store and refer data
- tablespace, tables(heap), functions, views, indexes, etc and even database itself
- identified by object identifier or OID, unsigned 4 byte long integer
- respective oids are stored in system catalog(pg_catalog) schema
- for instance: when a new database or a new table is created it’s all meta data are stored into
the pg_catalog.pg_database table and pg_catalog.pg_class table respectively and so on
49. Physical Layout(continued...)
pg_hba.conf
- stands for host based authentication
- created when initdb is called
- can stay elsewhere as well, default location data area
- configuration file to control client authentication
50. Physical Layout(continued...)
pg_ident.conf:
- configuration file to control postgres user name mapping
- used along with pg_hba.conf file
- maps system user name(achieved from some external authentication system like iden or
GSSAPI) of the client trying to connect to postgres server to postgres user
- can stay elsewhere as well, default location data area
PG_VERSION
- containing the major version number of PostgreSQL
51. Physical Layout(continued...)
postgresql.auto.conf
- system configurations changed using `ALTER SYSTEM SET
<confParameter>=<confValue>;` sql command are overwritten here
- gets cleared after resetting the parameter
postgresql.conf
- server configuration file
- can stay elsewhere as well
postmaster.opts:
- file containing command-line options used at server start time
52. Physical Layout(continued...)
postmaster.pid:
- keeps track of followings each in a separate line
- currently running postgres server instance pid
- path of data area
- server start timestamp in epoch time
- server port number
- unix socket path
- ip or hostname of listen_address
- shared memory segment id
- server status
- file is absent if no server instance is running
54. Physical Layout - DB Cluster(continued...)
- base directory contains the databases as subdirectories named after the corresponding
database oid which are created on pg_default tablespace
- tables or databases created on different tablespace like the one(test_db_2 → 16412 is
created with default tablespace to be test_table_spcace → 16410) here
55. Physical Layout - Table Files
- when a table is created, a file having the filenode of the table as the filename is created
- max table size is 32TB
- divided into 1GB sized segments(if page size is 8KB)
- each segment file from the second one will be named as <filenode>.1, <filenode>.2
and so on
- usually filenode is same of oid unless TRUNCATE, REINDEX, CLUSTER or ALTER
TABLE or AUTOVACUUM is applied to that table
57. Physical Layout - Table Files(continued…)
- each table segment contains several pages(8K sized)
- each page starts with some page header(24 bytes) followed by item pointers(4 bytes
each) and ends with the actual tuples(or items) and special space, the space in between
item pointer and actual item is called free space
- when a tuple can’t fit into a single page, it is stored in a separate file named
TOAST(The Oversized-Attribute Storage Technique) file created as
<filenode>_toast format
58. Physical Layout - Table Files(continued...)
- a table may contain an _fsm(Free Space Map) file and a _vm(Visibility Map) file
- when updating a tuple, postgres doesn’t overwrite it, creates a new one instead
marking the old one as deleted
- when deleting a tuple, postgres uses a policy of pseudo-deletion, it just marks the
existing tuple to be deleted and updates the _fsm file
- also vacuum worker finds out those unused spaces and recognises them as free space
and creates(if there’s none) or updates the _fsm and _vm file
59. Physical Layout - Table Files(continued...)
- _fsm file keeps track of the free spaces that can be reused by some other tuple
- _vm file keeps track of which pages in the segment has these tuple gaps by storing 2 bits per
page
- the first bit is only set when the corresponding page has no gaps making it easy for the next
scan
- _vm bits can only set by vacuum although other data-modifying operation can reset them
- index files don’t have any _vm file
63. Physical Layout - Tablespace
- symlink to some other storage where table files will be stored
- pg_global tablespace is used for shared system catalogs
- pg_default tablespace is the default tablespace of the template1 and template0 databases
- different tables of the same database can be kept in different tablespace
- use case:
- if you are running out of disk space, you can create a tablespace to a different disk to
move data to that location
- tablespace for highly accessed data can be set to fast disks like SSD and less accessed
ones can be stored in slower disks like SATA
- temporary tables can be stored in separate table space
64. Summary Attempt of:
- https://www.postgresql.org/docs/12/index.html
- https://developer.okta.com/blog/2019/07/19/mysql-vs-postgres
- https://en.wikipedia.org/wiki/PostgreSQL
- Learning PostgreSQL 11 (Third Edition), A beginner's guide to building high-performance
PostgreSQL database solutions, By Salahaldin Juba, Andrey Volkov
- https://www.izenda.com/relational-vs-non-relational-databases/
- https://medium.com/@zhenwu93/relational-vs-non-relational-databases-8336870da8bc
- https://www.ibm.com/cloud/blog/new-builders/brief-overview-database-landscape