SlideShare ist ein Scribd-Unternehmen logo
1 von 44
Downloaden Sie, um offline zu lesen
ADVANCED POSTGRES MONITORING
DENISH PATEL
SPEAKER
WHO IS THIS GUY?
▸ Sr. Database Architect at Medallia
▸ Recent fun employments:
▸ Principal Database Engineer@ WithMe
▸ Lead Database Architect @ OmniTI
▸ Expertise in PostgreSQL , Oracle, MySQL, NoSQL
▸ Contact : denish.j.patel@gmail.com or dpatel@medallia.com
▸ Twitter: @DenishPatel
▸ Blog: http://www.pateldenish.com
▸ Postgres Slack Channel (https://postgres-slack.herokuapp.com/)
2
AGENDA
DISCUSSION LIST
▸ What to look for monitoring solution in general?
▸ Comparison - selected open source and commercial monitoring solutions
▸ Which metrics to collect and how?
▸ Which metrics to alert on and how to define thresholds?
▸ How to keep up with monitoring changes ?
▸ How to react on alerts at 3AM?
▸ Open discussion
3
SELECTION CRITERIA
WHAT TO LOOK FOR IN MONITORING SOLUTION ?
▸ Blend of system monitoring with Postgres support
▸ Centralized monitoring
▸ Hosted vs On-premise
▸ Security concerns of clients
▸ Alerting and Dashboard/Graphs
▸ Easy installation and configuration
▸ Postgres Support
▸ pg_stat_statements
▸ Resource monitoring - CPU, RAM, DISK IO & Network
▸ pgbouncer support
4
COMPARISON
MONITORING SOLUTIONS
▸ Open Source
▸ Sensu
▸ Zabbix
▸ Zenoss (Limited capabilities)
▸ Nagios
▸ (Stop using Nagios so it can die peacefully!!)
▸ SAAS Offerings
▸ Wavefront
▸ Circonus
▸ Vividcortex
▸ OkMeter
▸ NewRelic
5
COMPARISON
OPEN SOURCE SOLUTIONS
6
Postgres
support?
Configuration Reco Confidence
Sensu Yes
check_postgres
Easy
Graphite
HIGH
Zabbix Yes Plugin
Easy
MED
Zenoss Yes
Plugin
Easy
MED
Nagios Yes
check_postgres
Difficult
LOW
COMPARISON
SAAS OFFERINGS
7
Postgres support Configuration Confidence
Wavefront
Collectd
Yes collectd plugins HIGH
Circonus Yes Default checks HIGH
Vividcortex Yes Default checks HIGH
Okmeter Yes
One click install
pgbouncer
HIGH
New Relic Yes
Plugins - missing
some metrics
MEDIUM
COMPARISON
SAAS OFFERINGS
▸ Capacity Planning
▸ Real Time Analytics
▸ Anomaly Detection
▸ Data Retention
▸ Support Reviews
▸ Pricing
8
WAVEFRONT.COM
WAVEFRONT
▸ Nice Dashboard and alerting functionality
▸ Very scalable solution
▸ Works with existing metrics collection tools i.e collectd
▸ Real time analytics capability
▸ Complete monitoring suite
9
OKMETER.IO
OKMETER
▸ It is agent based system so you just need to install agent in your environment
to monitor application, database or any other servers
▸ Very easy to install and configure
▸ Provides easy to configure Postgres Server monitoring using
pg_stat_statements with server stats. Once you install agent, you get
everything without any effort
▸ Built-in pgbouncer monitoring
▸ Built-in all resources monitoring ; Disk, CPU, Network & Memory
10
OKMETER.IO
PGBOUNCER STATS
11
USE CASE
MONITORING SOLUTION
▸ 150+ DB clusters across the globe
▸ Easy installation
▸ Standardization
▸ Centralized solution
▸ Real time analytics
▸ Support new Infra - Docker/Aurora/Mesos
12
METRICS COLLECTION
SETUP ROLE
13
create role collectd login encrypted password 'XXX';
create schema collectd;
set search_path = collectd,pg_catalog;
grant usage on schema collectd to collectd;
alter role collectd set search_path = collectd,pg_catalog;
▸ Things to consider:
▸ Separate role for monitoring
▸ No SUPER ROLE
▸ Limited permissions
METRICS COLLECTION
COLLECTD PLUGIN
14
LoadPlugin postgresql # https://github.com/collectd/collectd/blob/master/src/postgresql_default.conf
<Plugin postgresql>
<Database dba>
Host "localhost"
Port "5432"
User "collectd"
Query backends
Query transactions
Query queries
Query table_states
Query disk_io
Query disk_usage
Query query_plans
Query connections #custom
Query slow_queries #custom
Query txn_wraparound #custom
Query locks #custom
Query wal_files #custom
Query scans #custom
Query seq_scans #custom
Query avg_querytime #custom
Query checkpoints #custom
Query slave_lag #custom
</Database>
</Plugin>
METRICS COLLECTION
BACKENDS
15
Query backends>
Statement "SELECT count(*) AS count 
FROM pg_stat_activity 
WHERE datname = $1;"
Param database
<Result>
Type "pg_numbackends"
ValuesFrom "count"
</Result>
</Query>
METRICS COLLECTION
PG_STAT_ACTIVITY
16
create or replace function pg_stat_activity()
returns set of pg_catalog.pg_stat_activity
as $$
begin
return query(select * from pg_catalog.pg_stat_activity);
end $$
language plpgsql security definer;
revoke all on function pg_stat_activity() from public;
grant execute on function pg_stat_activity() to collectd;
METRICS COLLECTION
TRANSACTIONS
17
<Query transactions>
Statement "SELECT xact_commit, xact_rollback 
FROM pg_stat_database 
WHERE datname = $1;"
Param database
<Result>
Type "pg_xact"
InstancePrefix "commit"
ValuesFrom "xact_commit"
</Result>
<Result>
Type "pg_xact"
InstancePrefix "rollback"
ValuesFrom "xact_rollback"
</Result>
</Query>
METRICS COLLECTION
QUERIES (DML)
18
<Query queries>
Statement "SELECT sum(n_tup_ins) AS ins, 
sum(n_tup_upd) AS upd, 
sum(n_tup_del) AS del, 
sum(n_tup_hot_upd) AS hot_upd 
FROM pg_stat_user_tables;"
<Result>
Type "pg_n_tup_c"
InstancePrefix "ins"
ValuesFrom "ins"
</Result>
..
.
.
.
</Query>
METRICS COLLECTION
TABLE_STATES
19
<Query table_states>
Statement "SELECT sum(n_live_tup) AS live, sum(n_dead_tup) AS dead 
FROM pg_stat_user_tables;"
<Result>
Type "pg_n_tup_g"
InstancePrefix "live"
ValuesFrom "live"
</Result>
<Result>
Type "pg_n_tup_g"
InstancePrefix "dead"
ValuesFrom "dead"
</Result>
</Query>
METRICS COLLECTION
QUERY_PLANS
20
<Query query_plans>
Statement "SELECT sum(seq_scan) AS seq, 
sum(seq_tup_read) AS seq_tup_read, 
sum(idx_scan) AS idx, 
sum(idx_tup_fetch) AS idx_tup_fetch 
FROM pg_stat_user_tables;"
<Result>
Type "pg_scan"
InstancePrefix "seq"
ValuesFrom "seq"
.
.
</Query>
METRICS COLLECTION
DISK_IO
21
<Query disk_io>
Statement "SELECT coalesce(sum(heap_blks_read), 0) AS heap_read, 
coalesce(sum(heap_blks_hit), 0) AS heap_hit, 
coalesce(sum(idx_blks_read), 0) AS idx_read, 
coalesce(sum(idx_blks_hit), 0) AS idx_hit, 
coalesce(sum(toast_blks_read), 0) AS toast_read, 
coalesce(sum(toast_blks_hit), 0) AS toast_hit, 
coalesce(sum(tidx_blks_read), 0) AS tidx_read, 
coalesce(sum(tidx_blks_hit), 0) AS tidx_hit 
FROM pg_statio_user_tables;"
METRICS COLLECTIONS
DISK USAGE / DB SIZE
22
<Query disk_usage>
Statement "SELECT pg_database_size($1) AS size;"
Param database
<Result>
Type pg_db_size
ValuesFrom "size"
</Result>
</Query>
METRICS COLLECTION
CONNECTIONS #CUSTOM
23
<Query connections>
Statement "SELECT COUNT(state) AS count, state FROM (SELECT CASE
WHEN state = 'idle' THEN 'idle'
WHEN state = 'idle in transaction' THEN 'idle_in_transaction'
WHEN state = 'active' THEN 'active'
ELSE 'unknown' END AS state
FROM collectd.pg_stat_activity) state
GROUP BY state
UNION
SELECT COUNT(*) AS count, 'waiting' AS state
FROM collectd.pg_stat_activity WHERE waiting ;"
<Result>
Type "pg_numbackends"
InstancePrefix "state"
InstancesFrom "state"
ValuesFrom "count"
</Result>
</Query>
CONNECTIONS
24
METRICS COLLECTION
SLOW_QUERIES
25
<Query slow_queries>
Statement "SELECT COUNT(*) AS count FROM collectd.pg_stat_activity WHERE state='active'
AND now()-query_start > '300 seconds'::interval
AND query ~* '^(insert|update|delete|select)' ;"
<Result>
Type "counter"
InstancePrefix "pg_slow_queries"
ValuesFrom "count"
</Result>
</Query>
METRICS COLLECTION
TXN_WRAPAROUND
26
<Query txn_wraparound>
Statement "SELECT age(datfrozenxid) as txn_wrap_age FROM pg_database ;"
<Result>
Type "counter"
InstancePrefix "txn_wraparound"
ValuesFrom "txn_wrap_age"
</Result>
</Query>
METRICS COLLECTION
LOCKS
27
<Query locks>
Statement "SELECT COUNT(mode) AS count, mode FROM pg_locks GROUP BY mode
UNION SELECT COUNT(*) AS count, 'waiting' AS mode FROM pg_locks
WHERE granted is false ;"
<Result>
Type "gauge"
InstancePrefix "pg_locks"
InstancesFrom "mode"
ValuesFrom "count"
</Result>
</Query>
DB LOCKS
28
METRICS COLLECTION
WAL_FILES
29
<Query wal_files>
Statement "SELECT archived_count AS count, failed_count AS failed FROM pg_stat_archiver;"
<Result>
Type "gauge"
InstancePrefix "pg_wal_count"
ValuesFrom "count"
</Result>
<Result>
Type "gauge"
InstancePrefix "pg_wal_failed"
ValuesFrom "failed"
</Result> </Query>
METRICS COLLECTION
SCANS
30
<Query scans>
Statement "SELECT sum(idx_scan) as index_scans, sum(seq_scan) as seq_scans,
sum(idx_tup_fetch) as index_tup_fetch, sum(seq_tup_read) as seq_tup_read
FROM pg_stat_all_tables ; "
<Result>
Type "pg_scan"
InstancePrefix "index"
ValuesFrom "index_scans"
</Result>
.
.
</Query>
METRIC COLLECTION
SEQ_SCANS
31
<Query seq_scans>
Statement "SELECT CASE WHEN status='OK' THEN 0 ELSE 1 END AS status
FROM ( SELECT get_seq_scan_on_large_tables AS status
FROM collectd.get_seq_scan_on_large_tables) AS foo;"
<Result>
Type "gauge"
InstancePrefix "pg_seq_scans"
ValuesFrom "status"
</Result>
</Query>
METRICS COLLECTION
SEQ_SCAN_ON_LARGE_TABLES
32
CREATE MATERIALIZED VIEW collectd.seq_scan_on_large_tables AS
SELECT relid, schemaname, relname, seq_scan, seq_tup_read ,
pg_relation_size(relid) as relsize, now() as refreshed_at
FROM pg_stat_all_tables
WHERE pg_relation_size(relid) > 1073741824
AND schemaname not in ('pg_catalog', 'information_schema')
UNION ALL SELECT 0,'0','0','0',0,0,now();
ALTER materialized VIEW collectd.seq_scan_on_large_tables OWNER TO collectd;
METRICS COLLECTION
GET_SEQ_SCAN_ON_LARGE_TABLES
33
CREATE OR REPLACE FUNCTION collectd.get_seq_scan_on_large_tables()
RETURNS text AS
$$
DECLARE
v_matview text;
v_refreshed_at timestamptz;
v_tables_with_seq_scan text[];
BEGIN
SELECT refreshed_at INTO v_refreshed_at
FROM collectd.seq_scan_on_large_tables WHERE relid=0;
-- refresh MV every 4 hours
IF v_refreshed_at < now() - interval '4 hours' and pg_is_in_recovery() is false THEN
REFRESH MATERIALIZED VIEW collectd.seq_scan_on_large_tables;
END IF;
SELECT ARRAY (SELECT base.relname ||':'|| (current.seq_scan-base.seq_scan) INTO v_tables_with_seq_scan
FROM collectd.seq_scan_on_large_tables AS base
LEFT JOIN pg_stat_all_tables AS current ON (base.schemaname=base.schemaname AND base.relname=current.relname)
WHERE (current.seq_scan-base.seq_scan) > 0 AND ((current.seq_tup_read-base.seq_tup_read)/(current.seq_scan-base.seq_scan)) > 50000 ) AS
tables_with_seq_scan;
IF v_tables_with_seq_scan = '{}' THEN
RETURN 'OK';
ELSE
RETURN 'PROBLEM: Seq scan on table: '|| array_to_string(v_tables_with_seq_scan,'&');
END If;
END;
$$
LANGUAGE 'plpgsql' SECURITY DEFINER;
METRICS COLLECTION
AVG_QUERYTIME
34
<Query avg_querytime>
Statement "SELECT sum(total_time)/sum(calls) AS avg_querytime FROM
collectd.get_stat_statements() ;"
<Result>
Type "gauge"
InstancePrefix "pg_avg_querytime"
ValuesFrom "avg_querytime"
</Result>
</Query>
<Query scans>
METRICS COLLECTION
GET_STAT_STATEMENTS
35
create extension IF NOT EXISTS pg_stat_statements WITH SCHEMA collectd;
alter schema collectd owner to collectd;
CREATE OR REPLACE FUNCTION collectd.get_stat_statements() RETURNS SETOF
pg_stat_statements AS
$$
SELECT * FROM pg_stat_statements
WHERE dbid IN (SELECT oid FROM pg_database WHERE datname = current_database());
$$ LANGUAGE sql VOLATILE SECURITY DEFINER;
AVG QUERYTIME
36
OKMETER
37
METRICS COLLECTIONS
CHECKPOINTS
38
<Query checkpoints>
Statement "SELECT (checkpoints_timed + checkpoints_req) AS total_checkpoints
FROM pg_stat_bgwriter ;"
<Result>
Type "counter"
InstancePrefix "pg_checkpoints"
ValuesFrom "total_checkpoints"
</Result>
</Query>
METRICS COLLECTION
SLAVE LAG
39
<Query slave_lag>
Statement "SELECT CASE WHEN pg_is_in_recovery = 'false' THEN 0
ELSE COALESCE(ROUND(EXTRACT(epoch FROM now() pg_last_xact_replay_timestamp())),0) END
AS seconds
FROM pg_is_in_recovery();"
<Result>
Type "counter"
InstancePrefix "slave_lag"
ValuesFrom "seconds"
</Result>
</Query>
ALERTING
SETUP ALERTS ON DB METRICS
▸ Uptime
▸ Waiting Connections
▸ # of connections waiting > 5
▸ Slow queries
▸ # of slow queries > 5
▸ Seq scan on large tables
▸ TXN Wraparound
▸ Age Over 1.5B
▸ Disk space usage
▸ 85%?
▸ Slave lag
▸ 5 minutes?
40
MONITORING CHANGES
HOW TO KEEP UP?
▸ Design with failover in mind
▸ Keep eyes on new features for monitoring in latest DB or OS version
▸ Postgres 9.5 enhancements
▸ Commit timestamp tracking
▸ SELECT * FROM pg_last_committed_xact();
▸ cluster_name
▸ $ ps -ef | grep checkpointer
▸ postgres 12181 12178 0 11:12 ? 00:00:00 postgres: personnel: checkpointer process
▸ postgres 12207 12204 0 11:12 ? 00:00:00 postgres: reportsdb: checkpointer process
▸ postgres 12233 12230 0 11:12 ? 00:00:00 postgres: management: checkpointer process
▸ A bunch of changes coming in Postgres 9.6
▸ Improve the pg_stat_activity view provides more details about waiting on what resources
▸ Deploy monitoring through config management tools 41
INCIDENT MANAGEMENT
HOW TO BE READY TO HANDLE 3AM CALL?
▸ PagerDuty calendar : https://www.pagerduty.com/
▸ Document metrics
▸ URL for the Dashboard
▸ Alert resolution procedure
▸ Clear SLAs (Decision)
▸ Escalation policy
▸ Scenarios
▸ Wait for server to bring backup
▸ Failover
▸ Review alerts before going OnCall
▸ Oncall notification
▸ Think for the worst and document accordingly
▸ What if you are in movie theatre/beach etc.?
▸ What if you can’t jump on the server?
▸ Keep the document up-to-date 42
ANYTHING ELSE
OPEN DISCUSSION
43
KEEP IN TOUCH
THANKS & Q/A
▸ You!
▸ Conference committee
▸ Contact for further Q/A
▸ Twitter: @DenishPatel
▸ pateldenish.com
▸ dpatel@medallia.com
▸ denish.j.patel@gmail.com
44

Weitere ähnliche Inhalte

Was ist angesagt?

Kevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres Open
Kevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres OpenKevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres Open
Kevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres OpenPostgresOpen
 
PostgreSQL- An Introduction
PostgreSQL- An IntroductionPostgreSQL- An Introduction
PostgreSQL- An IntroductionSmita Prasad
 
What is new in PostgreSQL 14?
What is new in PostgreSQL 14?What is new in PostgreSQL 14?
What is new in PostgreSQL 14?Mydbops
 
High Availability PostgreSQL with Zalando Patroni
High Availability PostgreSQL with Zalando PatroniHigh Availability PostgreSQL with Zalando Patroni
High Availability PostgreSQL with Zalando PatroniZalando Technology
 
PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs PGConf APAC
 
MySQL Parallel Replication by Booking.com
MySQL Parallel Replication by Booking.comMySQL Parallel Replication by Booking.com
MySQL Parallel Replication by Booking.comJean-François Gagné
 
Postgresql Database Administration Basic - Day1
Postgresql  Database Administration Basic  - Day1Postgresql  Database Administration Basic  - Day1
Postgresql Database Administration Basic - Day1PoguttuezhiniVP
 
How to Take Advantage of Optimizer Improvements in MySQL 8.0
How to Take Advantage of Optimizer Improvements in MySQL 8.0How to Take Advantage of Optimizer Improvements in MySQL 8.0
How to Take Advantage of Optimizer Improvements in MySQL 8.0Norvald Ryeng
 
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 Vienna
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 ViennaAutovacuum, explained for engineers, new improved version PGConf.eu 2015 Vienna
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 ViennaPostgreSQL-Consulting
 
The MySQL Query Optimizer Explained Through Optimizer Trace
The MySQL Query Optimizer Explained Through Optimizer TraceThe MySQL Query Optimizer Explained Through Optimizer Trace
The MySQL Query Optimizer Explained Through Optimizer Traceoysteing
 
PostgreSQL HA
PostgreSQL   HAPostgreSQL   HA
PostgreSQL HAharoonm
 
Postgresql Database Administration- Day3
Postgresql Database Administration- Day3Postgresql Database Administration- Day3
Postgresql Database Administration- Day3PoguttuezhiniVP
 
Introduction to PostgreSQL
Introduction to PostgreSQLIntroduction to PostgreSQL
Introduction to PostgreSQLJoel Brewer
 
Getting started with postgresql
Getting started with postgresqlGetting started with postgresql
Getting started with postgresqlbotsplash.com
 
Connection Pooling in PostgreSQL using pgbouncer
Connection Pooling in PostgreSQL using pgbouncer Connection Pooling in PostgreSQL using pgbouncer
Connection Pooling in PostgreSQL using pgbouncer Sameer Kumar
 
PostgreSQL 15 and its Major Features -(Aakash M - Mydbops) - Mydbops Opensour...
PostgreSQL 15 and its Major Features -(Aakash M - Mydbops) - Mydbops Opensour...PostgreSQL 15 and its Major Features -(Aakash M - Mydbops) - Mydbops Opensour...
PostgreSQL 15 and its Major Features -(Aakash M - Mydbops) - Mydbops Opensour...Mydbops
 
ProxySQL for MySQL
ProxySQL for MySQLProxySQL for MySQL
ProxySQL for MySQLMydbops
 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesYoshinori Matsunobu
 
Transparent sharding with Spider: what's new and getting started
Transparent sharding with Spider: what's new and getting startedTransparent sharding with Spider: what's new and getting started
Transparent sharding with Spider: what's new and getting startedMariaDB plc
 

Was ist angesagt? (20)

Kevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres Open
Kevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres OpenKevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres Open
Kevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres Open
 
PostgreSQL- An Introduction
PostgreSQL- An IntroductionPostgreSQL- An Introduction
PostgreSQL- An Introduction
 
What is new in PostgreSQL 14?
What is new in PostgreSQL 14?What is new in PostgreSQL 14?
What is new in PostgreSQL 14?
 
High Availability PostgreSQL with Zalando Patroni
High Availability PostgreSQL with Zalando PatroniHigh Availability PostgreSQL with Zalando Patroni
High Availability PostgreSQL with Zalando Patroni
 
PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs
 
MySQL Parallel Replication by Booking.com
MySQL Parallel Replication by Booking.comMySQL Parallel Replication by Booking.com
MySQL Parallel Replication by Booking.com
 
Backup and-recovery2
Backup and-recovery2Backup and-recovery2
Backup and-recovery2
 
Postgresql Database Administration Basic - Day1
Postgresql  Database Administration Basic  - Day1Postgresql  Database Administration Basic  - Day1
Postgresql Database Administration Basic - Day1
 
How to Take Advantage of Optimizer Improvements in MySQL 8.0
How to Take Advantage of Optimizer Improvements in MySQL 8.0How to Take Advantage of Optimizer Improvements in MySQL 8.0
How to Take Advantage of Optimizer Improvements in MySQL 8.0
 
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 Vienna
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 ViennaAutovacuum, explained for engineers, new improved version PGConf.eu 2015 Vienna
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 Vienna
 
The MySQL Query Optimizer Explained Through Optimizer Trace
The MySQL Query Optimizer Explained Through Optimizer TraceThe MySQL Query Optimizer Explained Through Optimizer Trace
The MySQL Query Optimizer Explained Through Optimizer Trace
 
PostgreSQL HA
PostgreSQL   HAPostgreSQL   HA
PostgreSQL HA
 
Postgresql Database Administration- Day3
Postgresql Database Administration- Day3Postgresql Database Administration- Day3
Postgresql Database Administration- Day3
 
Introduction to PostgreSQL
Introduction to PostgreSQLIntroduction to PostgreSQL
Introduction to PostgreSQL
 
Getting started with postgresql
Getting started with postgresqlGetting started with postgresql
Getting started with postgresql
 
Connection Pooling in PostgreSQL using pgbouncer
Connection Pooling in PostgreSQL using pgbouncer Connection Pooling in PostgreSQL using pgbouncer
Connection Pooling in PostgreSQL using pgbouncer
 
PostgreSQL 15 and its Major Features -(Aakash M - Mydbops) - Mydbops Opensour...
PostgreSQL 15 and its Major Features -(Aakash M - Mydbops) - Mydbops Opensour...PostgreSQL 15 and its Major Features -(Aakash M - Mydbops) - Mydbops Opensour...
PostgreSQL 15 and its Major Features -(Aakash M - Mydbops) - Mydbops Opensour...
 
ProxySQL for MySQL
ProxySQL for MySQLProxySQL for MySQL
ProxySQL for MySQL
 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability Practices
 
Transparent sharding with Spider: what's new and getting started
Transparent sharding with Spider: what's new and getting startedTransparent sharding with Spider: what's new and getting started
Transparent sharding with Spider: what's new and getting started
 

Andere mochten auch

Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4Denish Patel
 
Postgres in Amazon RDS
Postgres in Amazon RDSPostgres in Amazon RDS
Postgres in Amazon RDSDenish Patel
 
Visual Design with Data
Visual Design with DataVisual Design with Data
Visual Design with DataSeth Familian
 
Two Elephants Inthe Room
Two Elephants Inthe RoomTwo Elephants Inthe Room
Two Elephants Inthe RoomDenish Patel
 
Out of the Box Replication in Postgres 9.4(PgConfUS)
Out of the Box Replication in Postgres 9.4(PgConfUS)Out of the Box Replication in Postgres 9.4(PgConfUS)
Out of the Box Replication in Postgres 9.4(PgConfUS)Denish Patel
 
Deploying postgre sql on amazon ec2
Deploying postgre sql on amazon ec2 Deploying postgre sql on amazon ec2
Deploying postgre sql on amazon ec2 Denish Patel
 
collectd & PostgreSQL
collectd & PostgreSQLcollectd & PostgreSQL
collectd & PostgreSQLMark Wong
 
generate_series関数使い込み
generate_series関数使い込みgenerate_series関数使い込み
generate_series関数使い込みkawarasho
 
P90 X Your Database!!
P90 X Your Database!!P90 X Your Database!!
P90 X Your Database!!Denish Patel
 
اختلاف القراءات من صيغة الماضي إلى غيرها دراسة دلالية
اختلاف القراءات من صيغة الماضي إلى غيرها دراسة دلاليةاختلاف القراءات من صيغة الماضي إلى غيرها دراسة دلالية
اختلاف القراءات من صيغة الماضي إلى غيرها دراسة دلاليةسمير بسيوني
 
PostgreSQL Scaling And Failover
PostgreSQL Scaling And FailoverPostgreSQL Scaling And Failover
PostgreSQL Scaling And FailoverJohn Paulett
 
PostgreSQL9.3 Switchover/Switchback
PostgreSQL9.3 Switchover/SwitchbackPostgreSQL9.3 Switchover/Switchback
PostgreSQL9.3 Switchover/SwitchbackVibhor Kumar
 
Building Hybrid data cluster using PostgreSQL and MongoDB
Building Hybrid data cluster using PostgreSQL and MongoDBBuilding Hybrid data cluster using PostgreSQL and MongoDB
Building Hybrid data cluster using PostgreSQL and MongoDBAshnikbiz
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL AdministrationCommand Prompt., Inc
 
Overview of Postgres 9.5
Overview of Postgres 9.5 Overview of Postgres 9.5
Overview of Postgres 9.5 EDB
 
AWS Webcast - Achieving consistent high performance with Postgres on Amazon W...
AWS Webcast - Achieving consistent high performance with Postgres on Amazon W...AWS Webcast - Achieving consistent high performance with Postgres on Amazon W...
AWS Webcast - Achieving consistent high performance with Postgres on Amazon W...Amazon Web Services
 
5 data storage_and_indexing
5 data storage_and_indexing5 data storage_and_indexing
5 data storage_and_indexingUtkarsh De
 
Best Practices for Database Schema Design
Best Practices for Database Schema DesignBest Practices for Database Schema Design
Best Practices for Database Schema DesignIron Speed
 

Andere mochten auch (20)

Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4
 
Postgres in Amazon RDS
Postgres in Amazon RDSPostgres in Amazon RDS
Postgres in Amazon RDS
 
Scaling postgres
Scaling postgresScaling postgres
Scaling postgres
 
Visual Design with Data
Visual Design with DataVisual Design with Data
Visual Design with Data
 
Two Elephants Inthe Room
Two Elephants Inthe RoomTwo Elephants Inthe Room
Two Elephants Inthe Room
 
Out of the Box Replication in Postgres 9.4(PgConfUS)
Out of the Box Replication in Postgres 9.4(PgConfUS)Out of the Box Replication in Postgres 9.4(PgConfUS)
Out of the Box Replication in Postgres 9.4(PgConfUS)
 
Deploying postgre sql on amazon ec2
Deploying postgre sql on amazon ec2 Deploying postgre sql on amazon ec2
Deploying postgre sql on amazon ec2
 
collectd & PostgreSQL
collectd & PostgreSQLcollectd & PostgreSQL
collectd & PostgreSQL
 
generate_series関数使い込み
generate_series関数使い込みgenerate_series関数使い込み
generate_series関数使い込み
 
P90 X Your Database!!
P90 X Your Database!!P90 X Your Database!!
P90 X Your Database!!
 
69
6969
69
 
اختلاف القراءات من صيغة الماضي إلى غيرها دراسة دلالية
اختلاف القراءات من صيغة الماضي إلى غيرها دراسة دلاليةاختلاف القراءات من صيغة الماضي إلى غيرها دراسة دلالية
اختلاف القراءات من صيغة الماضي إلى غيرها دراسة دلالية
 
PostgreSQL Scaling And Failover
PostgreSQL Scaling And FailoverPostgreSQL Scaling And Failover
PostgreSQL Scaling And Failover
 
PostgreSQL9.3 Switchover/Switchback
PostgreSQL9.3 Switchover/SwitchbackPostgreSQL9.3 Switchover/Switchback
PostgreSQL9.3 Switchover/Switchback
 
Building Hybrid data cluster using PostgreSQL and MongoDB
Building Hybrid data cluster using PostgreSQL and MongoDBBuilding Hybrid data cluster using PostgreSQL and MongoDB
Building Hybrid data cluster using PostgreSQL and MongoDB
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL Administration
 
Overview of Postgres 9.5
Overview of Postgres 9.5 Overview of Postgres 9.5
Overview of Postgres 9.5
 
AWS Webcast - Achieving consistent high performance with Postgres on Amazon W...
AWS Webcast - Achieving consistent high performance with Postgres on Amazon W...AWS Webcast - Achieving consistent high performance with Postgres on Amazon W...
AWS Webcast - Achieving consistent high performance with Postgres on Amazon W...
 
5 data storage_and_indexing
5 data storage_and_indexing5 data storage_and_indexing
5 data storage_and_indexing
 
Best Practices for Database Schema Design
Best Practices for Database Schema DesignBest Practices for Database Schema Design
Best Practices for Database Schema Design
 

Ähnlich wie Advanced Postgres Monitoring

Tony jambu (obscure) tools of the trade for tuning oracle sq ls
Tony jambu   (obscure) tools of the trade for tuning oracle sq lsTony jambu   (obscure) tools of the trade for tuning oracle sq ls
Tony jambu (obscure) tools of the trade for tuning oracle sq lsInSync Conference
 
Advanced pg_stat_statements: Filtering, Regression Testing & more
Advanced pg_stat_statements: Filtering, Regression Testing & moreAdvanced pg_stat_statements: Filtering, Regression Testing & more
Advanced pg_stat_statements: Filtering, Regression Testing & moreLukas Fittl
 
Apex and Virtual Private Database
Apex and Virtual Private DatabaseApex and Virtual Private Database
Apex and Virtual Private DatabaseJeffrey Kemp
 
PerlApp2Postgresql (2)
PerlApp2Postgresql (2)PerlApp2Postgresql (2)
PerlApp2Postgresql (2)Jerome Eteve
 
OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...
OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...
OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...NETWAYS
 
Oracle_Audit_APEX IOUG Collaborate 14
Oracle_Audit_APEX IOUG Collaborate 14Oracle_Audit_APEX IOUG Collaborate 14
Oracle_Audit_APEX IOUG Collaborate 14Leon Rzhemovskiy
 
Peeking into the Black Hole Called PL/PGSQL - the New PL Profiler / Jan Wieck...
Peeking into the Black Hole Called PL/PGSQL - the New PL Profiler / Jan Wieck...Peeking into the Black Hole Called PL/PGSQL - the New PL Profiler / Jan Wieck...
Peeking into the Black Hole Called PL/PGSQL - the New PL Profiler / Jan Wieck...Ontico
 
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should KnowDBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should KnowAlex Zaballa
 
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should KnowDBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should KnowAlex Zaballa
 
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should KnowDBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should KnowAlex Zaballa
 
Extra performance out of thin air
Extra performance out of thin airExtra performance out of thin air
Extra performance out of thin airKonstantine Krutiy
 
Oracle audit and reporting in one hour or less
Oracle audit and reporting in one hour or lessOracle audit and reporting in one hour or less
Oracle audit and reporting in one hour or lessLeon Rzhemovskiy
 
PostgreSQL Performance Problems: Monitoring and Alerting
PostgreSQL Performance Problems: Monitoring and AlertingPostgreSQL Performance Problems: Monitoring and Alerting
PostgreSQL Performance Problems: Monitoring and AlertingGrant Fritchey
 
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1MariaDB plc
 
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1MariaDB plc
 

Ähnlich wie Advanced Postgres Monitoring (20)

query_tuning.pdf
query_tuning.pdfquery_tuning.pdf
query_tuning.pdf
 
Tony jambu (obscure) tools of the trade for tuning oracle sq ls
Tony jambu   (obscure) tools of the trade for tuning oracle sq lsTony jambu   (obscure) tools of the trade for tuning oracle sq ls
Tony jambu (obscure) tools of the trade for tuning oracle sq ls
 
Advanced pg_stat_statements: Filtering, Regression Testing & more
Advanced pg_stat_statements: Filtering, Regression Testing & moreAdvanced pg_stat_statements: Filtering, Regression Testing & more
Advanced pg_stat_statements: Filtering, Regression Testing & more
 
Apex and Virtual Private Database
Apex and Virtual Private DatabaseApex and Virtual Private Database
Apex and Virtual Private Database
 
PerlApp2Postgresql (2)
PerlApp2Postgresql (2)PerlApp2Postgresql (2)
PerlApp2Postgresql (2)
 
OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...
OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...
OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...
 
Oracle_Audit_APEX IOUG Collaborate 14
Oracle_Audit_APEX IOUG Collaborate 14Oracle_Audit_APEX IOUG Collaborate 14
Oracle_Audit_APEX IOUG Collaborate 14
 
Sherlock holmes for dba’s
Sherlock holmes for dba’sSherlock holmes for dba’s
Sherlock holmes for dba’s
 
Aspects of 10 Tuning
Aspects of 10 TuningAspects of 10 Tuning
Aspects of 10 Tuning
 
Peeking into the Black Hole Called PL/PGSQL - the New PL Profiler / Jan Wieck...
Peeking into the Black Hole Called PL/PGSQL - the New PL Profiler / Jan Wieck...Peeking into the Black Hole Called PL/PGSQL - the New PL Profiler / Jan Wieck...
Peeking into the Black Hole Called PL/PGSQL - the New PL Profiler / Jan Wieck...
 
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should KnowDBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
 
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should KnowDBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
 
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should KnowDBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
 
Extra performance out of thin air
Extra performance out of thin airExtra performance out of thin air
Extra performance out of thin air
 
Oracle SQL Tuning
Oracle SQL TuningOracle SQL Tuning
Oracle SQL Tuning
 
Oracle audit and reporting in one hour or less
Oracle audit and reporting in one hour or lessOracle audit and reporting in one hour or less
Oracle audit and reporting in one hour or less
 
PostgreSQL Performance Problems: Monitoring and Alerting
PostgreSQL Performance Problems: Monitoring and AlertingPostgreSQL Performance Problems: Monitoring and Alerting
PostgreSQL Performance Problems: Monitoring and Alerting
 
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
 
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
 
Osol Pgsql
Osol PgsqlOsol Pgsql
Osol Pgsql
 

Mehr von Denish Patel

Out of the box replication in postgres 9.4(pg confus)
Out of the box replication in postgres 9.4(pg confus)Out of the box replication in postgres 9.4(pg confus)
Out of the box replication in postgres 9.4(pg confus)Denish Patel
 
Out of the Box Replication in Postgres 9.4(pgconfsf)
Out of the Box Replication in Postgres 9.4(pgconfsf)Out of the Box Replication in Postgres 9.4(pgconfsf)
Out of the Box Replication in Postgres 9.4(pgconfsf)Denish Patel
 
Out of the Box Replication in Postgres 9.4(PgCon)
Out of the Box Replication in Postgres 9.4(PgCon)Out of the Box Replication in Postgres 9.4(PgCon)
Out of the Box Replication in Postgres 9.4(PgCon)Denish Patel
 
Out of the Box Replication in Postgres 9.4(PgCon)
Out of the Box Replication in Postgres 9.4(PgCon)Out of the Box Replication in Postgres 9.4(PgCon)
Out of the Box Replication in Postgres 9.4(PgCon)Denish Patel
 
Choosing the "D" , Lightning talk
Choosing the "D" , Lightning talkChoosing the "D" , Lightning talk
Choosing the "D" , Lightning talkDenish Patel
 
Deploying Maximum HA Architecture With PostgreSQL
Deploying Maximum HA Architecture With PostgreSQLDeploying Maximum HA Architecture With PostgreSQL
Deploying Maximum HA Architecture With PostgreSQLDenish Patel
 
Deploying Maximum HA Architecture With PostgreSQL
Deploying Maximum HA Architecture With PostgreSQLDeploying Maximum HA Architecture With PostgreSQL
Deploying Maximum HA Architecture With PostgreSQLDenish Patel
 
Achieving Pci Compliace
Achieving Pci CompliaceAchieving Pci Compliace
Achieving Pci CompliaceDenish Patel
 
Using SQL Standards? Database SQL comparition
Using SQL Standards? Database SQL comparitionUsing SQL Standards? Database SQL comparition
Using SQL Standards? Database SQL comparitionDenish Patel
 
Oracle10g New Features I
Oracle10g New Features IOracle10g New Features I
Oracle10g New Features IDenish Patel
 
Yet Another Replication Tool: RubyRep
Yet Another Replication Tool: RubyRepYet Another Replication Tool: RubyRep
Yet Another Replication Tool: RubyRepDenish Patel
 

Mehr von Denish Patel (11)

Out of the box replication in postgres 9.4(pg confus)
Out of the box replication in postgres 9.4(pg confus)Out of the box replication in postgres 9.4(pg confus)
Out of the box replication in postgres 9.4(pg confus)
 
Out of the Box Replication in Postgres 9.4(pgconfsf)
Out of the Box Replication in Postgres 9.4(pgconfsf)Out of the Box Replication in Postgres 9.4(pgconfsf)
Out of the Box Replication in Postgres 9.4(pgconfsf)
 
Out of the Box Replication in Postgres 9.4(PgCon)
Out of the Box Replication in Postgres 9.4(PgCon)Out of the Box Replication in Postgres 9.4(PgCon)
Out of the Box Replication in Postgres 9.4(PgCon)
 
Out of the Box Replication in Postgres 9.4(PgCon)
Out of the Box Replication in Postgres 9.4(PgCon)Out of the Box Replication in Postgres 9.4(PgCon)
Out of the Box Replication in Postgres 9.4(PgCon)
 
Choosing the "D" , Lightning talk
Choosing the "D" , Lightning talkChoosing the "D" , Lightning talk
Choosing the "D" , Lightning talk
 
Deploying Maximum HA Architecture With PostgreSQL
Deploying Maximum HA Architecture With PostgreSQLDeploying Maximum HA Architecture With PostgreSQL
Deploying Maximum HA Architecture With PostgreSQL
 
Deploying Maximum HA Architecture With PostgreSQL
Deploying Maximum HA Architecture With PostgreSQLDeploying Maximum HA Architecture With PostgreSQL
Deploying Maximum HA Architecture With PostgreSQL
 
Achieving Pci Compliace
Achieving Pci CompliaceAchieving Pci Compliace
Achieving Pci Compliace
 
Using SQL Standards? Database SQL comparition
Using SQL Standards? Database SQL comparitionUsing SQL Standards? Database SQL comparition
Using SQL Standards? Database SQL comparition
 
Oracle10g New Features I
Oracle10g New Features IOracle10g New Features I
Oracle10g New Features I
 
Yet Another Replication Tool: RubyRep
Yet Another Replication Tool: RubyRepYet Another Replication Tool: RubyRep
Yet Another Replication Tool: RubyRep
 

Advanced Postgres Monitoring

  • 2. SPEAKER WHO IS THIS GUY? ▸ Sr. Database Architect at Medallia ▸ Recent fun employments: ▸ Principal Database Engineer@ WithMe ▸ Lead Database Architect @ OmniTI ▸ Expertise in PostgreSQL , Oracle, MySQL, NoSQL ▸ Contact : denish.j.patel@gmail.com or dpatel@medallia.com ▸ Twitter: @DenishPatel ▸ Blog: http://www.pateldenish.com ▸ Postgres Slack Channel (https://postgres-slack.herokuapp.com/) 2
  • 3. AGENDA DISCUSSION LIST ▸ What to look for monitoring solution in general? ▸ Comparison - selected open source and commercial monitoring solutions ▸ Which metrics to collect and how? ▸ Which metrics to alert on and how to define thresholds? ▸ How to keep up with monitoring changes ? ▸ How to react on alerts at 3AM? ▸ Open discussion 3
  • 4. SELECTION CRITERIA WHAT TO LOOK FOR IN MONITORING SOLUTION ? ▸ Blend of system monitoring with Postgres support ▸ Centralized monitoring ▸ Hosted vs On-premise ▸ Security concerns of clients ▸ Alerting and Dashboard/Graphs ▸ Easy installation and configuration ▸ Postgres Support ▸ pg_stat_statements ▸ Resource monitoring - CPU, RAM, DISK IO & Network ▸ pgbouncer support 4
  • 5. COMPARISON MONITORING SOLUTIONS ▸ Open Source ▸ Sensu ▸ Zabbix ▸ Zenoss (Limited capabilities) ▸ Nagios ▸ (Stop using Nagios so it can die peacefully!!) ▸ SAAS Offerings ▸ Wavefront ▸ Circonus ▸ Vividcortex ▸ OkMeter ▸ NewRelic 5
  • 6. COMPARISON OPEN SOURCE SOLUTIONS 6 Postgres support? Configuration Reco Confidence Sensu Yes check_postgres Easy Graphite HIGH Zabbix Yes Plugin Easy MED Zenoss Yes Plugin Easy MED Nagios Yes check_postgres Difficult LOW
  • 7. COMPARISON SAAS OFFERINGS 7 Postgres support Configuration Confidence Wavefront Collectd Yes collectd plugins HIGH Circonus Yes Default checks HIGH Vividcortex Yes Default checks HIGH Okmeter Yes One click install pgbouncer HIGH New Relic Yes Plugins - missing some metrics MEDIUM
  • 8. COMPARISON SAAS OFFERINGS ▸ Capacity Planning ▸ Real Time Analytics ▸ Anomaly Detection ▸ Data Retention ▸ Support Reviews ▸ Pricing 8
  • 9. WAVEFRONT.COM WAVEFRONT ▸ Nice Dashboard and alerting functionality ▸ Very scalable solution ▸ Works with existing metrics collection tools i.e collectd ▸ Real time analytics capability ▸ Complete monitoring suite 9
  • 10. OKMETER.IO OKMETER ▸ It is agent based system so you just need to install agent in your environment to monitor application, database or any other servers ▸ Very easy to install and configure ▸ Provides easy to configure Postgres Server monitoring using pg_stat_statements with server stats. Once you install agent, you get everything without any effort ▸ Built-in pgbouncer monitoring ▸ Built-in all resources monitoring ; Disk, CPU, Network & Memory 10
  • 12. USE CASE MONITORING SOLUTION ▸ 150+ DB clusters across the globe ▸ Easy installation ▸ Standardization ▸ Centralized solution ▸ Real time analytics ▸ Support new Infra - Docker/Aurora/Mesos 12
  • 13. METRICS COLLECTION SETUP ROLE 13 create role collectd login encrypted password 'XXX'; create schema collectd; set search_path = collectd,pg_catalog; grant usage on schema collectd to collectd; alter role collectd set search_path = collectd,pg_catalog; ▸ Things to consider: ▸ Separate role for monitoring ▸ No SUPER ROLE ▸ Limited permissions
  • 14. METRICS COLLECTION COLLECTD PLUGIN 14 LoadPlugin postgresql # https://github.com/collectd/collectd/blob/master/src/postgresql_default.conf <Plugin postgresql> <Database dba> Host "localhost" Port "5432" User "collectd" Query backends Query transactions Query queries Query table_states Query disk_io Query disk_usage Query query_plans Query connections #custom Query slow_queries #custom Query txn_wraparound #custom Query locks #custom Query wal_files #custom Query scans #custom Query seq_scans #custom Query avg_querytime #custom Query checkpoints #custom Query slave_lag #custom </Database> </Plugin>
  • 15. METRICS COLLECTION BACKENDS 15 Query backends> Statement "SELECT count(*) AS count FROM pg_stat_activity WHERE datname = $1;" Param database <Result> Type "pg_numbackends" ValuesFrom "count" </Result> </Query>
  • 16. METRICS COLLECTION PG_STAT_ACTIVITY 16 create or replace function pg_stat_activity() returns set of pg_catalog.pg_stat_activity as $$ begin return query(select * from pg_catalog.pg_stat_activity); end $$ language plpgsql security definer; revoke all on function pg_stat_activity() from public; grant execute on function pg_stat_activity() to collectd;
  • 17. METRICS COLLECTION TRANSACTIONS 17 <Query transactions> Statement "SELECT xact_commit, xact_rollback FROM pg_stat_database WHERE datname = $1;" Param database <Result> Type "pg_xact" InstancePrefix "commit" ValuesFrom "xact_commit" </Result> <Result> Type "pg_xact" InstancePrefix "rollback" ValuesFrom "xact_rollback" </Result> </Query>
  • 18. METRICS COLLECTION QUERIES (DML) 18 <Query queries> Statement "SELECT sum(n_tup_ins) AS ins, sum(n_tup_upd) AS upd, sum(n_tup_del) AS del, sum(n_tup_hot_upd) AS hot_upd FROM pg_stat_user_tables;" <Result> Type "pg_n_tup_c" InstancePrefix "ins" ValuesFrom "ins" </Result> .. . . . </Query>
  • 19. METRICS COLLECTION TABLE_STATES 19 <Query table_states> Statement "SELECT sum(n_live_tup) AS live, sum(n_dead_tup) AS dead FROM pg_stat_user_tables;" <Result> Type "pg_n_tup_g" InstancePrefix "live" ValuesFrom "live" </Result> <Result> Type "pg_n_tup_g" InstancePrefix "dead" ValuesFrom "dead" </Result> </Query>
  • 20. METRICS COLLECTION QUERY_PLANS 20 <Query query_plans> Statement "SELECT sum(seq_scan) AS seq, sum(seq_tup_read) AS seq_tup_read, sum(idx_scan) AS idx, sum(idx_tup_fetch) AS idx_tup_fetch FROM pg_stat_user_tables;" <Result> Type "pg_scan" InstancePrefix "seq" ValuesFrom "seq" . . </Query>
  • 21. METRICS COLLECTION DISK_IO 21 <Query disk_io> Statement "SELECT coalesce(sum(heap_blks_read), 0) AS heap_read, coalesce(sum(heap_blks_hit), 0) AS heap_hit, coalesce(sum(idx_blks_read), 0) AS idx_read, coalesce(sum(idx_blks_hit), 0) AS idx_hit, coalesce(sum(toast_blks_read), 0) AS toast_read, coalesce(sum(toast_blks_hit), 0) AS toast_hit, coalesce(sum(tidx_blks_read), 0) AS tidx_read, coalesce(sum(tidx_blks_hit), 0) AS tidx_hit FROM pg_statio_user_tables;"
  • 22. METRICS COLLECTIONS DISK USAGE / DB SIZE 22 <Query disk_usage> Statement "SELECT pg_database_size($1) AS size;" Param database <Result> Type pg_db_size ValuesFrom "size" </Result> </Query>
  • 23. METRICS COLLECTION CONNECTIONS #CUSTOM 23 <Query connections> Statement "SELECT COUNT(state) AS count, state FROM (SELECT CASE WHEN state = 'idle' THEN 'idle' WHEN state = 'idle in transaction' THEN 'idle_in_transaction' WHEN state = 'active' THEN 'active' ELSE 'unknown' END AS state FROM collectd.pg_stat_activity) state GROUP BY state UNION SELECT COUNT(*) AS count, 'waiting' AS state FROM collectd.pg_stat_activity WHERE waiting ;" <Result> Type "pg_numbackends" InstancePrefix "state" InstancesFrom "state" ValuesFrom "count" </Result> </Query>
  • 25. METRICS COLLECTION SLOW_QUERIES 25 <Query slow_queries> Statement "SELECT COUNT(*) AS count FROM collectd.pg_stat_activity WHERE state='active' AND now()-query_start > '300 seconds'::interval AND query ~* '^(insert|update|delete|select)' ;" <Result> Type "counter" InstancePrefix "pg_slow_queries" ValuesFrom "count" </Result> </Query>
  • 26. METRICS COLLECTION TXN_WRAPAROUND 26 <Query txn_wraparound> Statement "SELECT age(datfrozenxid) as txn_wrap_age FROM pg_database ;" <Result> Type "counter" InstancePrefix "txn_wraparound" ValuesFrom "txn_wrap_age" </Result> </Query>
  • 27. METRICS COLLECTION LOCKS 27 <Query locks> Statement "SELECT COUNT(mode) AS count, mode FROM pg_locks GROUP BY mode UNION SELECT COUNT(*) AS count, 'waiting' AS mode FROM pg_locks WHERE granted is false ;" <Result> Type "gauge" InstancePrefix "pg_locks" InstancesFrom "mode" ValuesFrom "count" </Result> </Query>
  • 29. METRICS COLLECTION WAL_FILES 29 <Query wal_files> Statement "SELECT archived_count AS count, failed_count AS failed FROM pg_stat_archiver;" <Result> Type "gauge" InstancePrefix "pg_wal_count" ValuesFrom "count" </Result> <Result> Type "gauge" InstancePrefix "pg_wal_failed" ValuesFrom "failed" </Result> </Query>
  • 30. METRICS COLLECTION SCANS 30 <Query scans> Statement "SELECT sum(idx_scan) as index_scans, sum(seq_scan) as seq_scans, sum(idx_tup_fetch) as index_tup_fetch, sum(seq_tup_read) as seq_tup_read FROM pg_stat_all_tables ; " <Result> Type "pg_scan" InstancePrefix "index" ValuesFrom "index_scans" </Result> . . </Query>
  • 31. METRIC COLLECTION SEQ_SCANS 31 <Query seq_scans> Statement "SELECT CASE WHEN status='OK' THEN 0 ELSE 1 END AS status FROM ( SELECT get_seq_scan_on_large_tables AS status FROM collectd.get_seq_scan_on_large_tables) AS foo;" <Result> Type "gauge" InstancePrefix "pg_seq_scans" ValuesFrom "status" </Result> </Query>
  • 32. METRICS COLLECTION SEQ_SCAN_ON_LARGE_TABLES 32 CREATE MATERIALIZED VIEW collectd.seq_scan_on_large_tables AS SELECT relid, schemaname, relname, seq_scan, seq_tup_read , pg_relation_size(relid) as relsize, now() as refreshed_at FROM pg_stat_all_tables WHERE pg_relation_size(relid) > 1073741824 AND schemaname not in ('pg_catalog', 'information_schema') UNION ALL SELECT 0,'0','0','0',0,0,now(); ALTER materialized VIEW collectd.seq_scan_on_large_tables OWNER TO collectd;
  • 33. METRICS COLLECTION GET_SEQ_SCAN_ON_LARGE_TABLES 33 CREATE OR REPLACE FUNCTION collectd.get_seq_scan_on_large_tables() RETURNS text AS $$ DECLARE v_matview text; v_refreshed_at timestamptz; v_tables_with_seq_scan text[]; BEGIN SELECT refreshed_at INTO v_refreshed_at FROM collectd.seq_scan_on_large_tables WHERE relid=0; -- refresh MV every 4 hours IF v_refreshed_at < now() - interval '4 hours' and pg_is_in_recovery() is false THEN REFRESH MATERIALIZED VIEW collectd.seq_scan_on_large_tables; END IF; SELECT ARRAY (SELECT base.relname ||':'|| (current.seq_scan-base.seq_scan) INTO v_tables_with_seq_scan FROM collectd.seq_scan_on_large_tables AS base LEFT JOIN pg_stat_all_tables AS current ON (base.schemaname=base.schemaname AND base.relname=current.relname) WHERE (current.seq_scan-base.seq_scan) > 0 AND ((current.seq_tup_read-base.seq_tup_read)/(current.seq_scan-base.seq_scan)) > 50000 ) AS tables_with_seq_scan; IF v_tables_with_seq_scan = '{}' THEN RETURN 'OK'; ELSE RETURN 'PROBLEM: Seq scan on table: '|| array_to_string(v_tables_with_seq_scan,'&'); END If; END; $$ LANGUAGE 'plpgsql' SECURITY DEFINER;
  • 34. METRICS COLLECTION AVG_QUERYTIME 34 <Query avg_querytime> Statement "SELECT sum(total_time)/sum(calls) AS avg_querytime FROM collectd.get_stat_statements() ;" <Result> Type "gauge" InstancePrefix "pg_avg_querytime" ValuesFrom "avg_querytime" </Result> </Query> <Query scans>
  • 35. METRICS COLLECTION GET_STAT_STATEMENTS 35 create extension IF NOT EXISTS pg_stat_statements WITH SCHEMA collectd; alter schema collectd owner to collectd; CREATE OR REPLACE FUNCTION collectd.get_stat_statements() RETURNS SETOF pg_stat_statements AS $$ SELECT * FROM pg_stat_statements WHERE dbid IN (SELECT oid FROM pg_database WHERE datname = current_database()); $$ LANGUAGE sql VOLATILE SECURITY DEFINER;
  • 38. METRICS COLLECTIONS CHECKPOINTS 38 <Query checkpoints> Statement "SELECT (checkpoints_timed + checkpoints_req) AS total_checkpoints FROM pg_stat_bgwriter ;" <Result> Type "counter" InstancePrefix "pg_checkpoints" ValuesFrom "total_checkpoints" </Result> </Query>
  • 39. METRICS COLLECTION SLAVE LAG 39 <Query slave_lag> Statement "SELECT CASE WHEN pg_is_in_recovery = 'false' THEN 0 ELSE COALESCE(ROUND(EXTRACT(epoch FROM now() pg_last_xact_replay_timestamp())),0) END AS seconds FROM pg_is_in_recovery();" <Result> Type "counter" InstancePrefix "slave_lag" ValuesFrom "seconds" </Result> </Query>
  • 40. ALERTING SETUP ALERTS ON DB METRICS ▸ Uptime ▸ Waiting Connections ▸ # of connections waiting > 5 ▸ Slow queries ▸ # of slow queries > 5 ▸ Seq scan on large tables ▸ TXN Wraparound ▸ Age Over 1.5B ▸ Disk space usage ▸ 85%? ▸ Slave lag ▸ 5 minutes? 40
  • 41. MONITORING CHANGES HOW TO KEEP UP? ▸ Design with failover in mind ▸ Keep eyes on new features for monitoring in latest DB or OS version ▸ Postgres 9.5 enhancements ▸ Commit timestamp tracking ▸ SELECT * FROM pg_last_committed_xact(); ▸ cluster_name ▸ $ ps -ef | grep checkpointer ▸ postgres 12181 12178 0 11:12 ? 00:00:00 postgres: personnel: checkpointer process ▸ postgres 12207 12204 0 11:12 ? 00:00:00 postgres: reportsdb: checkpointer process ▸ postgres 12233 12230 0 11:12 ? 00:00:00 postgres: management: checkpointer process ▸ A bunch of changes coming in Postgres 9.6 ▸ Improve the pg_stat_activity view provides more details about waiting on what resources ▸ Deploy monitoring through config management tools 41
  • 42. INCIDENT MANAGEMENT HOW TO BE READY TO HANDLE 3AM CALL? ▸ PagerDuty calendar : https://www.pagerduty.com/ ▸ Document metrics ▸ URL for the Dashboard ▸ Alert resolution procedure ▸ Clear SLAs (Decision) ▸ Escalation policy ▸ Scenarios ▸ Wait for server to bring backup ▸ Failover ▸ Review alerts before going OnCall ▸ Oncall notification ▸ Think for the worst and document accordingly ▸ What if you are in movie theatre/beach etc.? ▸ What if you can’t jump on the server? ▸ Keep the document up-to-date 42
  • 44. KEEP IN TOUCH THANKS & Q/A ▸ You! ▸ Conference committee ▸ Contact for further Q/A ▸ Twitter: @DenishPatel ▸ pateldenish.com ▸ dpatel@medallia.com ▸ denish.j.patel@gmail.com 44