SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Downloaden Sie, um offline zu lesen
Inside PostgreSQL Shared Memory
BRUCE MOMJIAN
January, 2015
POSTGRESQL is an open-source, full-featured relational database.
This presentation gives an overview of the shared memory
structures used by Postgres.
Creative Commons Attribution License http://momjian.us/presentations
1 / 30
PostgreSQL the database…
◮ Open Source Object Relational DBMS since 1996
◮ Distributed under the PostgreSQL License
◮ Similar technical heritage as Oracle, SQL Server & DB2
◮ However, a strong adherence to standards (ANSI-SQL 2008)
◮ Highly extensible and adaptable design
◮ Languages, indexing, data types, etc.
◮ E.g. PostGIS, JSONB, SQL/MED
◮ Extensive use throughout the world for applications and
organizations of all types
◮ Bundled into Red Hat Enterprise Linux, Ubuntu, CentOS
and Amazon Linux
Inside PostgreSQL Shared Memory 2 / 30
PostgreSQL the community…
◮ Independent community led by a Core Team of six
◮ Large, active and vibrant community
◮ www.postgresql.org
◮ Downloads, Mailing lists, Documentation
◮ Sponsors sampler:
◮ Google, Red Hat, VMWare, Skype, Salesforce, HP and
EnterpriseDB
◮ http://www.postgresql.org/community/
Inside PostgreSQL Shared Memory 3 / 30
EnterpriseDB the company…
◮ The worldwide leader of Postgres based products and
services founded in 2004
◮ Customers include 50 of the Fortune 500 and 98 of the
Forbes Global 2000
◮ Enterprise offerings:
◮ PostgreSQL Support, Services and Training
◮ Postgres Plus Advanced Server with Oracle Compatibility
◮ Tools for Monitoring, Replication, HA, Backup & Recovery
Community
◮ Citizenship
◮ Contributor of key features: Materialized Views, JSON, &
more
◮ Nine community members on staff
Inside PostgreSQL Shared Memory 4 / 30
EnterpriseDB the company…
Inside PostgreSQL Shared Memory 5 / 30
Outline
1. File storage format
2. Shared memory creation
3. Shared buffers
4. Row value access
5. Locking
6. Other structures
Inside PostgreSQL Shared Memory 6 / 30
File System /data
Postgres
Postgres
Postgres
/data
Inside PostgreSQL Shared Memory 7 / 30
File System /data/base
Postgres
Postgres
Postgres
/data
/pg_clog
/pg_multixact
/pg_subtrans
/pg_tblspc
/pg_xlog
/global
/pg_twophase
/base
Inside PostgreSQL Shared Memory 8 / 30
File System /data/base/db
Postgres
Postgres
Postgres
/data /base /16385 (production)
/1 (template1)
/17982 (devel)
/16821 (test)
/21452 (marketing)
Inside PostgreSQL Shared Memory 9 / 30
File System /data/base/db/table
Postgres
Postgres
Postgres
/data /base /16385 /24692 (customer)
/27214 (order)
/25932 (product)
/25952 (employee)
/27839 (part)
Inside PostgreSQL Shared Memory 10 / 30
File System Data Pages
Postgres
Postgres
Postgres
8k 8k 8k 8k
/data /base /16385 /24692
Inside PostgreSQL Shared Memory 11 / 30
Data Pages
Postgres
Postgres
Postgres
Page Header Item Item Item
Tuple
Tuple Tuple Special
8K
8k 8k 8k 8k
/data /base /16385 /24692
Inside PostgreSQL Shared Memory 12 / 30
File System Block Tuple
Postgres
Postgres
Postgres
Page Header Item Item Item
Tuple
Tuple Tuple Special
8K
8k 8k 8k 8k
/data /base /16385 /24692
Tuple
Inside PostgreSQL Shared Memory 13 / 30
File System Tuple
hoff − length of tuple header
infomask − tuple flags
natts − number of attributes
ctid − tuple id (page / item)
cmax − destruction command id
xmin − creation transaction id
xmax − destruction transaction id
cmin − creation command id
bits − bit map representing NULLs
OID − object id of tuple (optional)
Tuple
Value Value ValueValue Value Value ValueHeader
int4in(’9241’)
textout()
’Martin’
Inside PostgreSQL Shared Memory 14 / 30
Tuple Header C Structures
typedef struct HeapTupleFields
{
TransactionId t_xmin; /* inserting xact ID */
TransactionId t_xmax; /* deleting or locking xact ID */
union
{
CommandId t_cid; /* inserting or deleting command ID, or both */
TransactionId t_xvac; /* VACUUM FULL xact ID */
} t_field3;
} HeapTupleFields;
typedef struct HeapTupleHeaderData
{
union
{
HeapTupleFields t_heap;
DatumTupleFields t_datum;
} t_choice;
ItemPointerData t_ctid; /* current TID of this or newer tuple */
/* Fields below here must match MinimalTupleData! */
uint16 t_infomask2; /* number of attributes + various flags */
uint16 t_infomask; /* various flag bits, see below */
uint8 t_hoff; /* sizeof header incl. bitmap, padding */
/* ^ − 23 bytes − ^ */
bits8 t_bits[1]; /* bitmap of NULLs −− VARIABLE LENGTH */
/* MORE DATA FOLLOWS AT END OF STRUCT */
} HeapTupleHeaderData;
Inside PostgreSQL Shared Memory 15 / 30
Shared Memory Creation
postmaster postgres postgres
Program (Text)
Data
Program (Text)
Data
Shared Memory
Program (Text)
Data
Shared Memory Shared Memory
Stack Stack
fork()
Stack
Inside PostgreSQL Shared Memory 16 / 30
Shared Memory
Shared Buffers
Proc Array
PROC
Multi−XACT Buffers
Two−Phase Structs
Subtrans Buffers
CLOG Buffers
XLOG Buffers
Shared Invalidation
Lightweight Locks
Lock Hashes
Auto Vacuum
Btree Vacuum
Buffer Descriptors
Background Writer Synchronized Scan
Semaphores
Statistics
LOCK
PROCLOCK
Inside PostgreSQL Shared Memory 17 / 30
Shared Buffers
Page Header Item Item Item
Tuple
Tuple Tuple Special
8K
Postgres
Postgres
Postgres
8k 8k 8k 8k
/data /base /16385 /24692
Shared Buffers
LWLock − for page changes
Pin Count − prevent page replacement
read()
write()
8k 8k 8k
Buffer Descriptors
Inside PostgreSQL Shared Memory 18 / 30
HeapTuples
Shared Buffers
Postgres
HeapTuple
C pointer
hoff − length of tuple header
infomask − tuple flags
natts − number of attributes
ctid − tuple id (page / item)
cmax − destruction command id
xmin − creation transaction id
xmax − destruction transaction id
cmin − creation command id
bits − bit map representing NULLs
OID − object id of tuple (optional)
Tuple
Value Value ValueValue Value Value ValueHeader
int4in(’9241’)
textout()
’Martin’
Page Header Item Item Item
Tuple
Tuple Tuple Special
8K
8k 8k 8k
Inside PostgreSQL Shared Memory 19 / 30
Finding A Tuple Value in C
Datum
nocachegetattr(HeapTuple tuple,
int attnum,
TupleDesc tupleDesc,
bool *isnull)
{
HeapTupleHeader tup = tuple−>t_data;
Form_pg_attribute *att = tupleDesc−>attrs;
{
int i;
/*
* Note − This loop is a little tricky. For each non−null attribute,
* we have to first account for alignment padding before the attr,
* then advance over the attr based on its length. Nulls have no
* storage and no alignment padding either. We can use/set
* attcacheoff until we reach either a null or a var−width attribute.
*/
off = 0;
for (i = 0;; i++) /* loop exit is at "break" */
{
if (HeapTupleHasNulls(tuple) && att_isnull(i, bp))
continue; /* this cannot be the target att */
if (att[i]−>attlen == −1)
off = att_align_pointer(off, att[i]−>attalign, −1,
tp + off);
else
/* not varlena, so safe to use att_align_nominal */
off = att_align_nominal(off, att[i]−>attalign);
if (i == attnum)
break;
off = att_addlength_pointer(off, att[i]−>attlen, tp + off);
}
}
return fetchatt(att[attnum], tp + off);
}
Inside PostgreSQL Shared Memory 20 / 30
Value Access in C
#define fetch_att(T,attbyval,attlen) 
( 
(attbyval) ? 
( 
(attlen) == (int) sizeof(int32) ? 
Int32GetDatum(*((int32 *)(T))) 
: 
( 
(attlen) == (int) sizeof(int16) ? 
Int16GetDatum(*((int16 *)(T))) 
: 
( 
AssertMacro((attlen) == 1), 
CharGetDatum(*((char *)(T))) 
) 
) 
) 
: 
PointerGetDatum((char *) (T)) 
)
Inside PostgreSQL Shared Memory 21 / 30
Test And Set Lock
Can Succeed Or Fail
0/1
1
0
Success
Was 0 on exchange
Lock already taken
Was 1 on exchange
Failure
1
1
Inside PostgreSQL Shared Memory 22 / 30
Test And Set Lock
x86 Assembler
static __inline__ int
tas(volatile slock_t *lock)
{
register slock_t _res = 1;
/*
* Use a non−locking test before asserting the bus lock. Note that the
* extra test appears to be a small loss on some x86 platforms and a small
* win on others; it’s by no means clear that we should keep it.
*/
__asm__ __volatile__(
" cmpb $0,%1 n"
" jne 1f n"
" lock n"
" xchgb %0,%1 n"
"1: n"
: "+q"(_res), "+m"(*lock)
:
: "memory", "cc");
return (int) _res;
}
Inside PostgreSQL Shared Memory 23 / 30
Spin Lock
Always Succeeds
0/1
1
Success
Was 0 on exchange
Failure
Was 1 on exchange
Lock already taken
Sleep of increasing duration
0 1
1
Spinlocks are designed for short-lived locking operations, like
access to control structures. They are not be used to protect code
that makes kernel calls or other heavy operations.
Inside PostgreSQL Shared Memory 24 / 30
Light Weight Locks
Shared Buffers
Proc Array
PROC
Multi−XACT Buffers
Two−Phase Structs
Subtrans Buffers
CLOG Buffers
XLOG Buffers
Shared Invalidation
Lightweight Locks
Auto Vacuum
Btree Vacuum
Background Writer Synchronized Scan
Semaphores
Statistics
LOCK
PROCLOCK
Lock Hashes
Buffer Descriptors
Sleep On Lock
Light weight locks attempt to acquire the lock, and go to sleep on
a semaphore if the lock request fails. Spinlocks control access to
the light weight lock control structure.
Inside PostgreSQL Shared Memory 25 / 30
Database Object Locks
PROCLOCKPROC LOCK
Lock Hashes
Inside PostgreSQL Shared Memory 26 / 30
Proc
Proc Array
used usedusedempty empty empty
PROC
Inside PostgreSQL Shared Memory 27 / 30
Other Shared Memory Structures
Shared Buffers
Proc Array
PROC
Multi−XACT Buffers
Two−Phase Structs
Subtrans Buffers
CLOG Buffers
XLOG Buffers
Shared Invalidation
Lightweight Locks
Lock Hashes
Auto Vacuum
Btree Vacuum
Buffer Descriptors
Background Writer Synchronized Scan
Semaphores
Statistics
LOCK
PROCLOCK
Inside PostgreSQL Shared Memory 28 / 30
Additional Resources…
◮ Postgres Downloads:
◮ www.enterprisedb.com/downloads
◮ Product and Services information:
◮ info@enterprisedb.com
Inside PostgreSQL Shared Memory 29 / 30
Conclusion
http://momjian.us/presentations Pink Floyd: Wish You Were Here
Inside PostgreSQL Shared Memory 30 / 30

Weitere ähnliche Inhalte

Mehr von EDB

Is There Anything PgBouncer Can’t Do?
Is There Anything PgBouncer Can’t Do?Is There Anything PgBouncer Can’t Do?
Is There Anything PgBouncer Can’t Do?EDB
 
Data Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQLData Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQLEDB
 
Practical Partitioning in Production with Postgres
Practical Partitioning in Production with PostgresPractical Partitioning in Production with Postgres
Practical Partitioning in Production with PostgresEDB
 
A Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAINA Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAINEDB
 
IOT with PostgreSQL
IOT with PostgreSQLIOT with PostgreSQL
IOT with PostgreSQLEDB
 
A Journey from Oracle to PostgreSQL
A Journey from Oracle to PostgreSQLA Journey from Oracle to PostgreSQL
A Journey from Oracle to PostgreSQLEDB
 
Psql is awesome!
Psql is awesome!Psql is awesome!
Psql is awesome!EDB
 
EDB 13 - New Enhancements for Security and Usability - APJ
EDB 13 - New Enhancements for Security and Usability - APJEDB 13 - New Enhancements for Security and Usability - APJ
EDB 13 - New Enhancements for Security and Usability - APJEDB
 
Comment sauvegarder correctement vos données
Comment sauvegarder correctement vos donnéesComment sauvegarder correctement vos données
Comment sauvegarder correctement vos donnéesEDB
 
Cloud Native PostgreSQL - Italiano
Cloud Native PostgreSQL - ItalianoCloud Native PostgreSQL - Italiano
Cloud Native PostgreSQL - ItalianoEDB
 
New enhancements for security and usability in EDB 13
New enhancements for security and usability in EDB 13New enhancements for security and usability in EDB 13
New enhancements for security and usability in EDB 13EDB
 
Best Practices in Security with PostgreSQL
Best Practices in Security with PostgreSQLBest Practices in Security with PostgreSQL
Best Practices in Security with PostgreSQLEDB
 
Cloud Native PostgreSQL - APJ
Cloud Native PostgreSQL - APJCloud Native PostgreSQL - APJ
Cloud Native PostgreSQL - APJEDB
 
Best Practices in Security with PostgreSQL
Best Practices in Security with PostgreSQLBest Practices in Security with PostgreSQL
Best Practices in Security with PostgreSQLEDB
 
EDB Postgres & Tools in a Smart City Project
EDB Postgres & Tools in a Smart City ProjectEDB Postgres & Tools in a Smart City Project
EDB Postgres & Tools in a Smart City ProjectEDB
 
Migrate Today: Proactive Steps to Unhook from Oracle
Migrate Today: Proactive Steps to Unhook from OracleMigrate Today: Proactive Steps to Unhook from Oracle
Migrate Today: Proactive Steps to Unhook from OracleEDB
 
All you need to know about CREATE STATISTICS
All you need to know about CREATE STATISTICSAll you need to know about CREATE STATISTICS
All you need to know about CREATE STATISTICSEDB
 
Cloud Native PostgreSQL
Cloud Native PostgreSQLCloud Native PostgreSQL
Cloud Native PostgreSQLEDB
 
Remote DBA Service: Powering your DBA needs
Remote DBA Service: Powering your DBA needsRemote DBA Service: Powering your DBA needs
Remote DBA Service: Powering your DBA needsEDB
 
A Review of PostgreSQL Replication Approaches - APJ
A Review of PostgreSQL Replication Approaches - APJA Review of PostgreSQL Replication Approaches - APJ
A Review of PostgreSQL Replication Approaches - APJEDB
 

Mehr von EDB (20)

Is There Anything PgBouncer Can’t Do?
Is There Anything PgBouncer Can’t Do?Is There Anything PgBouncer Can’t Do?
Is There Anything PgBouncer Can’t Do?
 
Data Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQLData Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQL
 
Practical Partitioning in Production with Postgres
Practical Partitioning in Production with PostgresPractical Partitioning in Production with Postgres
Practical Partitioning in Production with Postgres
 
A Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAINA Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAIN
 
IOT with PostgreSQL
IOT with PostgreSQLIOT with PostgreSQL
IOT with PostgreSQL
 
A Journey from Oracle to PostgreSQL
A Journey from Oracle to PostgreSQLA Journey from Oracle to PostgreSQL
A Journey from Oracle to PostgreSQL
 
Psql is awesome!
Psql is awesome!Psql is awesome!
Psql is awesome!
 
EDB 13 - New Enhancements for Security and Usability - APJ
EDB 13 - New Enhancements for Security and Usability - APJEDB 13 - New Enhancements for Security and Usability - APJ
EDB 13 - New Enhancements for Security and Usability - APJ
 
Comment sauvegarder correctement vos données
Comment sauvegarder correctement vos donnéesComment sauvegarder correctement vos données
Comment sauvegarder correctement vos données
 
Cloud Native PostgreSQL - Italiano
Cloud Native PostgreSQL - ItalianoCloud Native PostgreSQL - Italiano
Cloud Native PostgreSQL - Italiano
 
New enhancements for security and usability in EDB 13
New enhancements for security and usability in EDB 13New enhancements for security and usability in EDB 13
New enhancements for security and usability in EDB 13
 
Best Practices in Security with PostgreSQL
Best Practices in Security with PostgreSQLBest Practices in Security with PostgreSQL
Best Practices in Security with PostgreSQL
 
Cloud Native PostgreSQL - APJ
Cloud Native PostgreSQL - APJCloud Native PostgreSQL - APJ
Cloud Native PostgreSQL - APJ
 
Best Practices in Security with PostgreSQL
Best Practices in Security with PostgreSQLBest Practices in Security with PostgreSQL
Best Practices in Security with PostgreSQL
 
EDB Postgres & Tools in a Smart City Project
EDB Postgres & Tools in a Smart City ProjectEDB Postgres & Tools in a Smart City Project
EDB Postgres & Tools in a Smart City Project
 
Migrate Today: Proactive Steps to Unhook from Oracle
Migrate Today: Proactive Steps to Unhook from OracleMigrate Today: Proactive Steps to Unhook from Oracle
Migrate Today: Proactive Steps to Unhook from Oracle
 
All you need to know about CREATE STATISTICS
All you need to know about CREATE STATISTICSAll you need to know about CREATE STATISTICS
All you need to know about CREATE STATISTICS
 
Cloud Native PostgreSQL
Cloud Native PostgreSQLCloud Native PostgreSQL
Cloud Native PostgreSQL
 
Remote DBA Service: Powering your DBA needs
Remote DBA Service: Powering your DBA needsRemote DBA Service: Powering your DBA needs
Remote DBA Service: Powering your DBA needs
 
A Review of PostgreSQL Replication Approaches - APJ
A Review of PostgreSQL Replication Approaches - APJA Review of PostgreSQL Replication Approaches - APJ
A Review of PostgreSQL Replication Approaches - APJ
 

Kürzlich hochgeladen

WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightSafe Software
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringWSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data SciencePaolo Missier
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...caitlingebhard1
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfdanishmna97
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governanceWSO2
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....rightmanforbloodline
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformWSO2
 

Kürzlich hochgeladen (20)

WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software Engineering
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
 

Inside Postgres Shared Memory with EnterpriseDB

  • 1. Inside PostgreSQL Shared Memory BRUCE MOMJIAN January, 2015 POSTGRESQL is an open-source, full-featured relational database. This presentation gives an overview of the shared memory structures used by Postgres. Creative Commons Attribution License http://momjian.us/presentations 1 / 30
  • 2. PostgreSQL the database… ◮ Open Source Object Relational DBMS since 1996 ◮ Distributed under the PostgreSQL License ◮ Similar technical heritage as Oracle, SQL Server & DB2 ◮ However, a strong adherence to standards (ANSI-SQL 2008) ◮ Highly extensible and adaptable design ◮ Languages, indexing, data types, etc. ◮ E.g. PostGIS, JSONB, SQL/MED ◮ Extensive use throughout the world for applications and organizations of all types ◮ Bundled into Red Hat Enterprise Linux, Ubuntu, CentOS and Amazon Linux Inside PostgreSQL Shared Memory 2 / 30
  • 3. PostgreSQL the community… ◮ Independent community led by a Core Team of six ◮ Large, active and vibrant community ◮ www.postgresql.org ◮ Downloads, Mailing lists, Documentation ◮ Sponsors sampler: ◮ Google, Red Hat, VMWare, Skype, Salesforce, HP and EnterpriseDB ◮ http://www.postgresql.org/community/ Inside PostgreSQL Shared Memory 3 / 30
  • 4. EnterpriseDB the company… ◮ The worldwide leader of Postgres based products and services founded in 2004 ◮ Customers include 50 of the Fortune 500 and 98 of the Forbes Global 2000 ◮ Enterprise offerings: ◮ PostgreSQL Support, Services and Training ◮ Postgres Plus Advanced Server with Oracle Compatibility ◮ Tools for Monitoring, Replication, HA, Backup & Recovery Community ◮ Citizenship ◮ Contributor of key features: Materialized Views, JSON, & more ◮ Nine community members on staff Inside PostgreSQL Shared Memory 4 / 30
  • 5. EnterpriseDB the company… Inside PostgreSQL Shared Memory 5 / 30
  • 6. Outline 1. File storage format 2. Shared memory creation 3. Shared buffers 4. Row value access 5. Locking 6. Other structures Inside PostgreSQL Shared Memory 6 / 30
  • 9. File System /data/base/db Postgres Postgres Postgres /data /base /16385 (production) /1 (template1) /17982 (devel) /16821 (test) /21452 (marketing) Inside PostgreSQL Shared Memory 9 / 30
  • 10. File System /data/base/db/table Postgres Postgres Postgres /data /base /16385 /24692 (customer) /27214 (order) /25932 (product) /25952 (employee) /27839 (part) Inside PostgreSQL Shared Memory 10 / 30
  • 11. File System Data Pages Postgres Postgres Postgres 8k 8k 8k 8k /data /base /16385 /24692 Inside PostgreSQL Shared Memory 11 / 30
  • 12. Data Pages Postgres Postgres Postgres Page Header Item Item Item Tuple Tuple Tuple Special 8K 8k 8k 8k 8k /data /base /16385 /24692 Inside PostgreSQL Shared Memory 12 / 30
  • 13. File System Block Tuple Postgres Postgres Postgres Page Header Item Item Item Tuple Tuple Tuple Special 8K 8k 8k 8k 8k /data /base /16385 /24692 Tuple Inside PostgreSQL Shared Memory 13 / 30
  • 14. File System Tuple hoff − length of tuple header infomask − tuple flags natts − number of attributes ctid − tuple id (page / item) cmax − destruction command id xmin − creation transaction id xmax − destruction transaction id cmin − creation command id bits − bit map representing NULLs OID − object id of tuple (optional) Tuple Value Value ValueValue Value Value ValueHeader int4in(’9241’) textout() ’Martin’ Inside PostgreSQL Shared Memory 14 / 30
  • 15. Tuple Header C Structures typedef struct HeapTupleFields { TransactionId t_xmin; /* inserting xact ID */ TransactionId t_xmax; /* deleting or locking xact ID */ union { CommandId t_cid; /* inserting or deleting command ID, or both */ TransactionId t_xvac; /* VACUUM FULL xact ID */ } t_field3; } HeapTupleFields; typedef struct HeapTupleHeaderData { union { HeapTupleFields t_heap; DatumTupleFields t_datum; } t_choice; ItemPointerData t_ctid; /* current TID of this or newer tuple */ /* Fields below here must match MinimalTupleData! */ uint16 t_infomask2; /* number of attributes + various flags */ uint16 t_infomask; /* various flag bits, see below */ uint8 t_hoff; /* sizeof header incl. bitmap, padding */ /* ^ − 23 bytes − ^ */ bits8 t_bits[1]; /* bitmap of NULLs −− VARIABLE LENGTH */ /* MORE DATA FOLLOWS AT END OF STRUCT */ } HeapTupleHeaderData; Inside PostgreSQL Shared Memory 15 / 30
  • 16. Shared Memory Creation postmaster postgres postgres Program (Text) Data Program (Text) Data Shared Memory Program (Text) Data Shared Memory Shared Memory Stack Stack fork() Stack Inside PostgreSQL Shared Memory 16 / 30
  • 17. Shared Memory Shared Buffers Proc Array PROC Multi−XACT Buffers Two−Phase Structs Subtrans Buffers CLOG Buffers XLOG Buffers Shared Invalidation Lightweight Locks Lock Hashes Auto Vacuum Btree Vacuum Buffer Descriptors Background Writer Synchronized Scan Semaphores Statistics LOCK PROCLOCK Inside PostgreSQL Shared Memory 17 / 30
  • 18. Shared Buffers Page Header Item Item Item Tuple Tuple Tuple Special 8K Postgres Postgres Postgres 8k 8k 8k 8k /data /base /16385 /24692 Shared Buffers LWLock − for page changes Pin Count − prevent page replacement read() write() 8k 8k 8k Buffer Descriptors Inside PostgreSQL Shared Memory 18 / 30
  • 19. HeapTuples Shared Buffers Postgres HeapTuple C pointer hoff − length of tuple header infomask − tuple flags natts − number of attributes ctid − tuple id (page / item) cmax − destruction command id xmin − creation transaction id xmax − destruction transaction id cmin − creation command id bits − bit map representing NULLs OID − object id of tuple (optional) Tuple Value Value ValueValue Value Value ValueHeader int4in(’9241’) textout() ’Martin’ Page Header Item Item Item Tuple Tuple Tuple Special 8K 8k 8k 8k Inside PostgreSQL Shared Memory 19 / 30
  • 20. Finding A Tuple Value in C Datum nocachegetattr(HeapTuple tuple, int attnum, TupleDesc tupleDesc, bool *isnull) { HeapTupleHeader tup = tuple−>t_data; Form_pg_attribute *att = tupleDesc−>attrs; { int i; /* * Note − This loop is a little tricky. For each non−null attribute, * we have to first account for alignment padding before the attr, * then advance over the attr based on its length. Nulls have no * storage and no alignment padding either. We can use/set * attcacheoff until we reach either a null or a var−width attribute. */ off = 0; for (i = 0;; i++) /* loop exit is at "break" */ { if (HeapTupleHasNulls(tuple) && att_isnull(i, bp)) continue; /* this cannot be the target att */ if (att[i]−>attlen == −1) off = att_align_pointer(off, att[i]−>attalign, −1, tp + off); else /* not varlena, so safe to use att_align_nominal */ off = att_align_nominal(off, att[i]−>attalign); if (i == attnum) break; off = att_addlength_pointer(off, att[i]−>attlen, tp + off); } } return fetchatt(att[attnum], tp + off); } Inside PostgreSQL Shared Memory 20 / 30
  • 21. Value Access in C #define fetch_att(T,attbyval,attlen) ( (attbyval) ? ( (attlen) == (int) sizeof(int32) ? Int32GetDatum(*((int32 *)(T))) : ( (attlen) == (int) sizeof(int16) ? Int16GetDatum(*((int16 *)(T))) : ( AssertMacro((attlen) == 1), CharGetDatum(*((char *)(T))) ) ) ) : PointerGetDatum((char *) (T)) ) Inside PostgreSQL Shared Memory 21 / 30
  • 22. Test And Set Lock Can Succeed Or Fail 0/1 1 0 Success Was 0 on exchange Lock already taken Was 1 on exchange Failure 1 1 Inside PostgreSQL Shared Memory 22 / 30
  • 23. Test And Set Lock x86 Assembler static __inline__ int tas(volatile slock_t *lock) { register slock_t _res = 1; /* * Use a non−locking test before asserting the bus lock. Note that the * extra test appears to be a small loss on some x86 platforms and a small * win on others; it’s by no means clear that we should keep it. */ __asm__ __volatile__( " cmpb $0,%1 n" " jne 1f n" " lock n" " xchgb %0,%1 n" "1: n" : "+q"(_res), "+m"(*lock) : : "memory", "cc"); return (int) _res; } Inside PostgreSQL Shared Memory 23 / 30
  • 24. Spin Lock Always Succeeds 0/1 1 Success Was 0 on exchange Failure Was 1 on exchange Lock already taken Sleep of increasing duration 0 1 1 Spinlocks are designed for short-lived locking operations, like access to control structures. They are not be used to protect code that makes kernel calls or other heavy operations. Inside PostgreSQL Shared Memory 24 / 30
  • 25. Light Weight Locks Shared Buffers Proc Array PROC Multi−XACT Buffers Two−Phase Structs Subtrans Buffers CLOG Buffers XLOG Buffers Shared Invalidation Lightweight Locks Auto Vacuum Btree Vacuum Background Writer Synchronized Scan Semaphores Statistics LOCK PROCLOCK Lock Hashes Buffer Descriptors Sleep On Lock Light weight locks attempt to acquire the lock, and go to sleep on a semaphore if the lock request fails. Spinlocks control access to the light weight lock control structure. Inside PostgreSQL Shared Memory 25 / 30
  • 26. Database Object Locks PROCLOCKPROC LOCK Lock Hashes Inside PostgreSQL Shared Memory 26 / 30
  • 27. Proc Proc Array used usedusedempty empty empty PROC Inside PostgreSQL Shared Memory 27 / 30
  • 28. Other Shared Memory Structures Shared Buffers Proc Array PROC Multi−XACT Buffers Two−Phase Structs Subtrans Buffers CLOG Buffers XLOG Buffers Shared Invalidation Lightweight Locks Lock Hashes Auto Vacuum Btree Vacuum Buffer Descriptors Background Writer Synchronized Scan Semaphores Statistics LOCK PROCLOCK Inside PostgreSQL Shared Memory 28 / 30
  • 29. Additional Resources… ◮ Postgres Downloads: ◮ www.enterprisedb.com/downloads ◮ Product and Services information: ◮ info@enterprisedb.com Inside PostgreSQL Shared Memory 29 / 30
  • 30. Conclusion http://momjian.us/presentations Pink Floyd: Wish You Were Here Inside PostgreSQL Shared Memory 30 / 30