SlideShare ist ein Scribd-Unternehmen logo
1 von 40
Downloaden Sie, um offline zu lesen
Clustering in PostgreSQL
Because one database server is never enough (and
neither is two)
PGConf Nepal
May 11-12, 2023
-[ RECORD 1 ]--------------------------------------------
name | Umair Shahid
description | 20 year veteran of the PostgreSQL community
company | Stormatics
designation | Founder
location | Islamabad, Pakistan
family | Mom, Wife & 2 kids
kid1 | Son, 16 year old
kid2 | Daughter, 13 year old
postgres=# select * from umair;
PostgreSQL Solutions for the Enterprise
Clustering & Analytics, backed by 24/7 Support and Professional Services
10 years of
popularity
As measured by
PostgreSQL Oracle
MySQL SQL Server
https://db-engines.com/en/ranking_trend
Loved by Developers
2022 Developer Survey
Most
Loved
Most Used
(Professional Developers)
Most
Wanted
On to the topic now!
What is High Availability?
● Remain operational even in the face of hardware or software failure
● Minimize downtime
● Essential for mission-critical applications that require 24/7 availability
● Measured in ‘Nines of Availability’
Nines of Availability
Availability Downtime per year
90% (one nine) 36.53 days
99% (two nines) 3.65 days
99.9% (three nines) 8.77 hours
99.99% (four nines) 52.60 minutes
99.999% (five nines) 5.26 minutes
But my database resides
in the cloud, and the
cloud is always available
Right?
Wrong!
Amazon RDS Service Level Agreement
Multi-AZ configurations for MySQL, MariaDB, Oracle, and PostgreSQL are covered by the
Amazon RDS Service Level Agreement ("SLA"). The RDS SLA affirms that AWS will use
commercially reasonable efforts to make Multi-AZ instances of Amazon RDS available
with a Monthly Uptime Percentage of at least 99.95% during any monthly billing cycle. In
the event Amazon RDS does not meet the Monthly Uptime Percentage commitment,
affected customers will be eligible to receive a service credit.*
99.95% = three and a half nines = 4.38 hours of downtime per year!!!
*
https://aws.amazon.com/rds/ha/
So - what do I do if I want
better reliability for my
mission-critical data?
Clustering!
● Multiple database servers work
together to provide redundancy
● Gives the appearance of a single
database server
● Application communicates with
the primary PostgreSQL instance
● Data is replicated to standby
instances
● Auto failover in case the primary
node goes down
What is clustering?
Primary
Standby 1 Standby 2
Application
Write
Read
Replicate
What is auto failover?
Primary
Standby 1 Standby 2
Application
Standby 1
Primary
Standby 2 New Standby
Application
Primary
Standby 1 Standby 2
Application
1 2 3
- Primary node goes down - Standby 1 gets promoted to Primary
- Standby 2 becomes subscriber to
Standby 1
- New Standby is added to the cluster
- Application talks to the new Primary
● Write to the primary
PostgreSQL instance and
read from standbys
● Data redundancy through
replication to two standbys
● Auto failover in case the
primary node goes down
Clusters with load balancing
Primary
Standby 1 Standby 2
Application
Write
Read
Replicate
● Incremental backups
● Redundancy introduced
from primary as well
standbys
● RTO and RPO balance
achieved per organizational
requirements
● Point-in-time recovery
Clusters with backups and disaster recovery
Primary
Standby 1 Standby 2
Application
Write
Read
Replicate
Incremental
Backup
Incremental
Backup
Incremental
Backup
Backup
● Shared-Everything architecture
● Load balancing for read as well as
write operations
● Database redundancy to achieve
high availability
● Asynchronous replication between
nodes for better efficiency
*with conflict resolution at the application layer
Multi-node clusters with Active-Active configuration*
Active 2
Application
Write Read Replicate
Active 1 Active 3
● Shared-Nothing architecture
● Automatic data sharding
based on defined criteria
● Read and write operations
are auto directed to the
relevant node
● Each node can have its own
standbys for high availability
Node 2
Application
Node 1 Node 3
Multi-node clusters with data sharding and horizontal scaling
Coordinator
Write Read
Globally distributed clusters
● Spin up clusters on the public
cloud, private cloud, on-prem,
bare metal, VMs, or a hybrid of
all the above
● Geo fencing for regulatory
compliance
● High availability across data
centers and geographies
Replication - synchronous vs asynchronous
Synchronous
● Data is transferred immediately
● Transaction waits for confirmation from
replica before it commits
● Ensures data consistency across all nodes
● Performance overhead caused by latency
● Used where data accuracy is critical, even
at the expense of performance
Asynchronous
● Data may not be transferred immediately
● Transaction commits without waiting for
confirmation from replica
● Data may be inconsistent across nodes
● Faster and more scalable
● Used where performance matters more
than data accuracy
Challenges in
Clustering
● Split brain
● Network Latency
● False alarms
● Data inconsistency
#AI
● Split brain
● Network Latency
● False alarms
● Data inconsistency
Challenges in
Clustering
● Defined: Node in a highly available cluster lose connectivity
with each other but continue to function independently
● Challenge: More than one node believes that it is the primary
leading to inconsistencies and possible data loss
Split Brain
● Quorum based voting
○ Majority nodes must agree on
primary
○ Cluster stops if quorum is not
achieved
● Redundancy and failover
○ Prevents a single point of
failure
● Proper configuration
● Split brain resolver
○ Software based detection &
resolution
○ Can shut down nodes in case
of scenario
● Network segmentation
○ Physical network separation
○ Avoid congestion
Split Brain - Prevention
Split Brain - Resolution
● Witness node
○ Third node that acts as a tie-breaker
○ The winner acts as primary, the loser is shut down
● Consensus algorithm
○ Paxos and Raft protocols are popular
● Manual resolution
○ DBA observes which nodes are competing to be the primary
○ Takes action based on best practices learnt from experience
● Split brain
● Network Latency
● False alarms
● Data inconsistency
Challenges in
Clustering
● Defined: Time delay when data is transmitted from one point to another
● Challenge: Delayed replication can result in data loss. Delayed signals
can trigger a false positive.
Network Latency
● Deploy nodes in proximity
○ Reduces time delay and network
hops
● Implement quorum-based system
○ If quorum isn’t achieved failover
won’t occur
● High speed, low latency
networking
○ High quality hardware and
associated software
● Optimize database configuration
○ Parameter tuning based on
workloads
○ max_connections,
tcp_keealive_idle, …
● Alerting system
○ Detect and alert admins of
possible issues to preempt false
positives
Network Latency - Preventing False Positives
● Split brain
● Network Latency
● False alarms
● Data inconsistency
Challenges in
Clustering
● Defined: A problem is reported, but in reality, there is no issue
● Challenge: Can trigger a failover when one isn’t required, leading to
unnecessary disruptions and impacting performance
False Alarms
False Alarms - Prevention
● Proper configuration
○ Best practices, past experience, and some hit & trial is required to ensure that the
thresholds are configured appropriately
● Regular maintenance
○ Latest version of software and firmware to be used in order to avoid known bugs
and exploits
● Regular testing
○ Testing of various use cases can help identify bottlenecks and possible
misconfigurations
● Multiple monitoring tools
○ Multiple tools can help cross-reference alerts and confirm if failover is required
False Alarms - Resolution
In case a false alarm is triggered …
● Check logs
○ Check the logs of cluster nodes, network devices, and other infrastructure
components to identify any anomalies
● Notify stakeholders
○ Notify all stakeholders involved in the cluster’s operations to prevent any
unnecessary action
● Monitor cluster health
○ Monitor to cluster’s health closely to ensure that it is functioning correctly
and no further false alarms are triggered
● Split brain
● Network Latency
● False alarms
● Data inconsistency
Challenges in
Clustering
● Defined: Situations where data in different nodes of a cluster becomes out of
sync, leading to inconsistent results and potential data corruption
● Challenge: Inaccurate query results that vary based on which node is queried.
Such issues are very hard to debug.
Data Inconsistency
● Synchronous replication
○ Ensures data is synchronized
across all nodes before it is
committed
○ Induces a performance overhead
● Load balancer
○ Ensure that all queries from a
specific application are sent to
the same node
○ Relies on eventual consistency of
the cluster
Data Inconsistency - Prevention
● Monitoring tools
○ Help with early identification of
possible issues so you can take
preventive measures
● Maintenance windows
○ Minimize data disruption with
planned downtime
● Regular testing
○ Test production use cases
regularly to ensure the cluster is
behaving as expected
Data Inconsistency - Resolution
● Resynchronization
○ Manual sync between nodes to correct data inconsistencies
● Rollback
○ Point-in-time recovery using tools like pg_rewind and pg_backrest
● Monitoring
○ Diligent monitoring to ensure that the issue doesn’t recur
This all sounds really hard
Open source clustering tools for PostgreSQL
● repmgr
○ https://repmgr.org/
○ GPL v3
○ Provides automatic failover
○ Manage and monitor replication
● pgpool-II
○ https://pgpool.net/
○ Similar to BSD & MIT
○ Middleware between PostgreSQL and client applications
○ Connection pooling, load balancing, caching, and automatic failover
● Patroni
○ https://patroni.readthedocs.io/en/latest/
○ MIT
○ Template for PostgreSQL high availability clusters
○ Automatic failover, configuration management, & cluster management
Questions?
pg_umair

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Tips to drive maria db cluster performance for nextcloud
Tips to drive maria db cluster performance for nextcloudTips to drive maria db cluster performance for nextcloud
Tips to drive maria db cluster performance for nextcloud
 
My sql failover test using orchestrator
My sql failover test  using orchestratorMy sql failover test  using orchestrator
My sql failover test using orchestrator
 
Maximizing performance via tuning and optimization
Maximizing performance via tuning and optimizationMaximizing performance via tuning and optimization
Maximizing performance via tuning and optimization
 
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
 
Cloud Native PostgreSQL
Cloud Native PostgreSQLCloud Native PostgreSQL
Cloud Native PostgreSQL
 
What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...
What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...
What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
 
PostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized WorldPostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized World
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performance
 
Almost Perfect Service Discovery and Failover with ProxySQL and Orchestrator
Almost Perfect Service Discovery and Failover with ProxySQL and OrchestratorAlmost Perfect Service Discovery and Failover with ProxySQL and Orchestrator
Almost Perfect Service Discovery and Failover with ProxySQL and Orchestrator
 
ProxySQL for MySQL
ProxySQL for MySQLProxySQL for MySQL
ProxySQL for MySQL
 
Mysql-MHA
Mysql-MHAMysql-MHA
Mysql-MHA
 
InnoDB Performance Optimisation
InnoDB Performance OptimisationInnoDB Performance Optimisation
InnoDB Performance Optimisation
 
Evolution of MySQL Parallel Replication
Evolution of MySQL Parallel Replication Evolution of MySQL Parallel Replication
Evolution of MySQL Parallel Replication
 
Faster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDBFaster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDB
 
MySQL Parallel Replication: inventory, use-case and limitations
MySQL Parallel Replication: inventory, use-case and limitationsMySQL Parallel Replication: inventory, use-case and limitations
MySQL Parallel Replication: inventory, use-case and limitations
 
Keeping Latency Low for User-Defined Functions with WebAssembly
Keeping Latency Low for User-Defined Functions with WebAssemblyKeeping Latency Low for User-Defined Functions with WebAssembly
Keeping Latency Low for User-Defined Functions with WebAssembly
 
Galera cluster for high availability
Galera cluster for high availability Galera cluster for high availability
Galera cluster for high availability
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
 

Ähnlich wie 20230511 - PGConf Nepal - Clustering in PostgreSQL_ Because one database server is never enough (and neither is two).pdf

kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High load
Krivoy Rog IT Community
 
Beyond unit tests: Deployment and testing for Hadoop/Spark workflows
Beyond unit tests: Deployment and testing for Hadoop/Spark workflowsBeyond unit tests: Deployment and testing for Hadoop/Spark workflows
Beyond unit tests: Deployment and testing for Hadoop/Spark workflows
DataWorks Summit
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
DataStax Academy
 

Ähnlich wie 20230511 - PGConf Nepal - Clustering in PostgreSQL_ Because one database server is never enough (and neither is two).pdf (20)

Clustering in PostgreSQL - Because one database server is never enough (and n...
Clustering in PostgreSQL - Because one database server is never enough (and n...Clustering in PostgreSQL - Because one database server is never enough (and n...
Clustering in PostgreSQL - Because one database server is never enough (and n...
 
Building real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark StreamingBuilding real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark Streaming
 
OSMC 2019 | How to improve database Observability by Charles Judith
OSMC 2019 | How to improve database Observability by Charles JudithOSMC 2019 | How to improve database Observability by Charles Judith
OSMC 2019 | How to improve database Observability by Charles Judith
 
Zero Downtime JEE Architectures
Zero Downtime JEE ArchitecturesZero Downtime JEE Architectures
Zero Downtime JEE Architectures
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High load
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
 
Beyond unit tests: Deployment and testing for Hadoop/Spark workflows
Beyond unit tests: Deployment and testing for Hadoop/Spark workflowsBeyond unit tests: Deployment and testing for Hadoop/Spark workflows
Beyond unit tests: Deployment and testing for Hadoop/Spark workflows
 
Distruted applications
Distruted applicationsDistruted applications
Distruted applications
 
Design patterns for scaling web applications
Design patterns for scaling web applicationsDesign patterns for scaling web applications
Design patterns for scaling web applications
 
Lookout on Scaling Security to 100 Million Devices
Lookout on Scaling Security to 100 Million DevicesLookout on Scaling Security to 100 Million Devices
Lookout on Scaling Security to 100 Million Devices
 
Megastore by Google
Megastore by GoogleMegastore by Google
Megastore by Google
 
Performance Whackamole (short version)
Performance Whackamole (short version)Performance Whackamole (short version)
Performance Whackamole (short version)
 
Scalabe MySQL Infrastructure
Scalabe MySQL InfrastructureScalabe MySQL Infrastructure
Scalabe MySQL Infrastructure
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
 
Our Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent CloudOur Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent Cloud
 
[@IndeedEng] Redundant Array of Inexpensive Datacenters
[@IndeedEng] Redundant Array of Inexpensive Datacenters[@IndeedEng] Redundant Array of Inexpensive Datacenters
[@IndeedEng] Redundant Array of Inexpensive Datacenters
 
Voxxed Athens 2018 - Methods and Practices for Guaranteed Failure in Big Data
Voxxed Athens 2018 - Methods and Practices for Guaranteed Failure in Big DataVoxxed Athens 2018 - Methods and Practices for Guaranteed Failure in Big Data
Voxxed Athens 2018 - Methods and Practices for Guaranteed Failure in Big Data
 
Server fleet management using Camunda by Akhil Ahuja
Server fleet management using Camunda by Akhil AhujaServer fleet management using Camunda by Akhil Ahuja
Server fleet management using Camunda by Akhil Ahuja
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Kürzlich hochgeladen (20)

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

20230511 - PGConf Nepal - Clustering in PostgreSQL_ Because one database server is never enough (and neither is two).pdf

  • 1. Clustering in PostgreSQL Because one database server is never enough (and neither is two) PGConf Nepal May 11-12, 2023
  • 2. -[ RECORD 1 ]-------------------------------------------- name | Umair Shahid description | 20 year veteran of the PostgreSQL community company | Stormatics designation | Founder location | Islamabad, Pakistan family | Mom, Wife & 2 kids kid1 | Son, 16 year old kid2 | Daughter, 13 year old postgres=# select * from umair;
  • 3. PostgreSQL Solutions for the Enterprise Clustering & Analytics, backed by 24/7 Support and Professional Services
  • 4. 10 years of popularity As measured by PostgreSQL Oracle MySQL SQL Server https://db-engines.com/en/ranking_trend
  • 5. Loved by Developers 2022 Developer Survey Most Loved Most Used (Professional Developers) Most Wanted
  • 6. On to the topic now!
  • 7. What is High Availability? ● Remain operational even in the face of hardware or software failure ● Minimize downtime ● Essential for mission-critical applications that require 24/7 availability ● Measured in ‘Nines of Availability’
  • 8. Nines of Availability Availability Downtime per year 90% (one nine) 36.53 days 99% (two nines) 3.65 days 99.9% (three nines) 8.77 hours 99.99% (four nines) 52.60 minutes 99.999% (five nines) 5.26 minutes
  • 9. But my database resides in the cloud, and the cloud is always available Right?
  • 11. Amazon RDS Service Level Agreement Multi-AZ configurations for MySQL, MariaDB, Oracle, and PostgreSQL are covered by the Amazon RDS Service Level Agreement ("SLA"). The RDS SLA affirms that AWS will use commercially reasonable efforts to make Multi-AZ instances of Amazon RDS available with a Monthly Uptime Percentage of at least 99.95% during any monthly billing cycle. In the event Amazon RDS does not meet the Monthly Uptime Percentage commitment, affected customers will be eligible to receive a service credit.* 99.95% = three and a half nines = 4.38 hours of downtime per year!!! * https://aws.amazon.com/rds/ha/
  • 12. So - what do I do if I want better reliability for my mission-critical data? Clustering!
  • 13. ● Multiple database servers work together to provide redundancy ● Gives the appearance of a single database server ● Application communicates with the primary PostgreSQL instance ● Data is replicated to standby instances ● Auto failover in case the primary node goes down What is clustering? Primary Standby 1 Standby 2 Application Write Read Replicate
  • 14. What is auto failover? Primary Standby 1 Standby 2 Application Standby 1 Primary Standby 2 New Standby Application Primary Standby 1 Standby 2 Application 1 2 3 - Primary node goes down - Standby 1 gets promoted to Primary - Standby 2 becomes subscriber to Standby 1 - New Standby is added to the cluster - Application talks to the new Primary
  • 15. ● Write to the primary PostgreSQL instance and read from standbys ● Data redundancy through replication to two standbys ● Auto failover in case the primary node goes down Clusters with load balancing Primary Standby 1 Standby 2 Application Write Read Replicate
  • 16. ● Incremental backups ● Redundancy introduced from primary as well standbys ● RTO and RPO balance achieved per organizational requirements ● Point-in-time recovery Clusters with backups and disaster recovery Primary Standby 1 Standby 2 Application Write Read Replicate Incremental Backup Incremental Backup Incremental Backup Backup
  • 17. ● Shared-Everything architecture ● Load balancing for read as well as write operations ● Database redundancy to achieve high availability ● Asynchronous replication between nodes for better efficiency *with conflict resolution at the application layer Multi-node clusters with Active-Active configuration* Active 2 Application Write Read Replicate Active 1 Active 3
  • 18. ● Shared-Nothing architecture ● Automatic data sharding based on defined criteria ● Read and write operations are auto directed to the relevant node ● Each node can have its own standbys for high availability Node 2 Application Node 1 Node 3 Multi-node clusters with data sharding and horizontal scaling Coordinator Write Read
  • 19. Globally distributed clusters ● Spin up clusters on the public cloud, private cloud, on-prem, bare metal, VMs, or a hybrid of all the above ● Geo fencing for regulatory compliance ● High availability across data centers and geographies
  • 20. Replication - synchronous vs asynchronous Synchronous ● Data is transferred immediately ● Transaction waits for confirmation from replica before it commits ● Ensures data consistency across all nodes ● Performance overhead caused by latency ● Used where data accuracy is critical, even at the expense of performance Asynchronous ● Data may not be transferred immediately ● Transaction commits without waiting for confirmation from replica ● Data may be inconsistent across nodes ● Faster and more scalable ● Used where performance matters more than data accuracy
  • 21. Challenges in Clustering ● Split brain ● Network Latency ● False alarms ● Data inconsistency
  • 22. #AI
  • 23. ● Split brain ● Network Latency ● False alarms ● Data inconsistency Challenges in Clustering
  • 24. ● Defined: Node in a highly available cluster lose connectivity with each other but continue to function independently ● Challenge: More than one node believes that it is the primary leading to inconsistencies and possible data loss Split Brain
  • 25. ● Quorum based voting ○ Majority nodes must agree on primary ○ Cluster stops if quorum is not achieved ● Redundancy and failover ○ Prevents a single point of failure ● Proper configuration ● Split brain resolver ○ Software based detection & resolution ○ Can shut down nodes in case of scenario ● Network segmentation ○ Physical network separation ○ Avoid congestion Split Brain - Prevention
  • 26. Split Brain - Resolution ● Witness node ○ Third node that acts as a tie-breaker ○ The winner acts as primary, the loser is shut down ● Consensus algorithm ○ Paxos and Raft protocols are popular ● Manual resolution ○ DBA observes which nodes are competing to be the primary ○ Takes action based on best practices learnt from experience
  • 27. ● Split brain ● Network Latency ● False alarms ● Data inconsistency Challenges in Clustering
  • 28. ● Defined: Time delay when data is transmitted from one point to another ● Challenge: Delayed replication can result in data loss. Delayed signals can trigger a false positive. Network Latency
  • 29. ● Deploy nodes in proximity ○ Reduces time delay and network hops ● Implement quorum-based system ○ If quorum isn’t achieved failover won’t occur ● High speed, low latency networking ○ High quality hardware and associated software ● Optimize database configuration ○ Parameter tuning based on workloads ○ max_connections, tcp_keealive_idle, … ● Alerting system ○ Detect and alert admins of possible issues to preempt false positives Network Latency - Preventing False Positives
  • 30. ● Split brain ● Network Latency ● False alarms ● Data inconsistency Challenges in Clustering
  • 31. ● Defined: A problem is reported, but in reality, there is no issue ● Challenge: Can trigger a failover when one isn’t required, leading to unnecessary disruptions and impacting performance False Alarms
  • 32. False Alarms - Prevention ● Proper configuration ○ Best practices, past experience, and some hit & trial is required to ensure that the thresholds are configured appropriately ● Regular maintenance ○ Latest version of software and firmware to be used in order to avoid known bugs and exploits ● Regular testing ○ Testing of various use cases can help identify bottlenecks and possible misconfigurations ● Multiple monitoring tools ○ Multiple tools can help cross-reference alerts and confirm if failover is required
  • 33. False Alarms - Resolution In case a false alarm is triggered … ● Check logs ○ Check the logs of cluster nodes, network devices, and other infrastructure components to identify any anomalies ● Notify stakeholders ○ Notify all stakeholders involved in the cluster’s operations to prevent any unnecessary action ● Monitor cluster health ○ Monitor to cluster’s health closely to ensure that it is functioning correctly and no further false alarms are triggered
  • 34. ● Split brain ● Network Latency ● False alarms ● Data inconsistency Challenges in Clustering
  • 35. ● Defined: Situations where data in different nodes of a cluster becomes out of sync, leading to inconsistent results and potential data corruption ● Challenge: Inaccurate query results that vary based on which node is queried. Such issues are very hard to debug. Data Inconsistency
  • 36. ● Synchronous replication ○ Ensures data is synchronized across all nodes before it is committed ○ Induces a performance overhead ● Load balancer ○ Ensure that all queries from a specific application are sent to the same node ○ Relies on eventual consistency of the cluster Data Inconsistency - Prevention ● Monitoring tools ○ Help with early identification of possible issues so you can take preventive measures ● Maintenance windows ○ Minimize data disruption with planned downtime ● Regular testing ○ Test production use cases regularly to ensure the cluster is behaving as expected
  • 37. Data Inconsistency - Resolution ● Resynchronization ○ Manual sync between nodes to correct data inconsistencies ● Rollback ○ Point-in-time recovery using tools like pg_rewind and pg_backrest ● Monitoring ○ Diligent monitoring to ensure that the issue doesn’t recur
  • 38. This all sounds really hard
  • 39. Open source clustering tools for PostgreSQL ● repmgr ○ https://repmgr.org/ ○ GPL v3 ○ Provides automatic failover ○ Manage and monitor replication ● pgpool-II ○ https://pgpool.net/ ○ Similar to BSD & MIT ○ Middleware between PostgreSQL and client applications ○ Connection pooling, load balancing, caching, and automatic failover ● Patroni ○ https://patroni.readthedocs.io/en/latest/ ○ MIT ○ Template for PostgreSQL high availability clusters ○ Automatic failover, configuration management, & cluster management