SlideShare ist ein Scribd-Unternehmen logo
1 von 80
Downloaden Sie, um offline zu lesen
The Art of Database
Experiments
Postgres.ai
Nikolay Samokhvalov
email: ns@postgres.ai
twitter: @postgresmen
About meAbout me:
Postgres experience: 13+ years (databases: 17+)
Past: Co-founder/CTO of 3 startups (total 30M+ users), all based on Postgres
Founder of #RuPostgres (1700+ members on Meetup.com, the 2nd largest globally)
Re-launched consulting practice in the SF Bay Area PostgreSQL.support
Founder of Postgres.ai – a platform aimed to automate what is not yet automated
Twitter: @postgresmen
Email: ns@postgres.ai
Part 1. Why database experiments?
How do we find SQL performance issues?
prod
monitoring
Engineer
prod
monitoring
Engineer
Automated signals (alerts),
manual observations
prod
monitoring
Engineer
Automated signals (alerts),
manual observations
prod
monitoring
Engineer
Automated signals (alerts),
manual observations
The most common tools for general SQL performance analysis:
prod
monitoring
Engineer
● pg_stat_statements
○ no query examples
○ no query plans
Automated signals (alerts),
manual observations
The most common tools for general SQL performance analysis:
prod
monitoring
Engineer
● pg_stat_statements
○ no query examples
○ no query plans
● log analysis (pgBadger)
○ requires maintenance
○ not “live”
○ no query plans (unless auto_explain is ON)
○ usually, not full picture
(log_min_duration_statement >> 0)
Automated signals (alerts),
manual observations
The most common tools for general SQL performance analysis:
How do we solve the issues we found?
dev/staging
m
anual
checks
manual checks
prod
monitoring
Engineer
dev/staging
m
anual
checks
manual checks
Four ways to improve performance:
prod
monitoring
Engineer
dev/staging
m
anual
checks
manual checks
Four ways to improve performance:
1. Tune Postgres configuration
prod
monitoring
Engineer
dev/staging
m
anual
checks
manual checks
Four ways to improve performance:
1. Tune Postgres configuration
2. Add/remove indexes
prod
monitoring
Engineer
dev/staging
m
anual
checks
manual checks
Four ways to improve performance:
1. Tune Postgres configuration
2. Add/remove indexes
3. Rewrite query / redesign DB schema
prod
monitoring
Engineer
dev/staging
m
anual
checks
manual checks
Four ways to improve performance:
1. Tune Postgres configuration
2. Add/remove indexes
3. Rewrite query / redesign DB schema
4. Add more resources (CPU, RAM, disks)
prod
monitoring
Engineer
Four ways to improve performance:
1. Tune Postgres configuration
2. Add/remove indexes
3. Rewrite query / redesign DB schema
4. Add more resources (CPU, RAM, disks)
Four ways to improve performance:
1. Tune Postgres configuration
2. Add/remove indexes
3. Rewrite query / redesign DB schema
4. Add more resources (CPU, RAM, disks)
~280 knobs!
Four ways to improve performance:
1. Tune Postgres configuration
2. Add/remove indexes
3. Rewrite query / redesign DB schema
4. Add more resources (CPU, RAM, disks)
~280 knobs!
btree, hash, GiST, SP-GiST, GIN,
RUM, BRIN, Bloom;
unique, partial, functional, covering
Four ways to improve performance:
1. Tune Postgres configuration
2. Add/remove indexes
3. Rewrite query / redesign DB schema
4. Add more resources (CPU, RAM, disks)
~280 knobs! No real-workload
and real-data
verification
(or very limited
and affecting
production)
btree, hash, GiST, SP-GiST, GIN,
RUM, BRIN, Bloom;
unique, partial, functional, covering
Four ways to improve performance:
1. Tune Postgres configuration
2. Add/remove indexes
3. Rewrite query / redesign DB schema
4. Add more resources (CPU, RAM, disks)
~280 knobs!
Sub-optimal
or even far
from optimal
decisions
No real-workload
and real-data
verification
(or very limited
and affecting
production)
btree, hash, GiST, SP-GiST, GIN,
RUM, BRIN, Bloom;
unique, partial, functional, covering
Four ways to improve performance:
1. Tune Postgres configuration
2. Add/remove indexes
3. Rewrite query / redesign DB schema
4. Add more resources (CPU, RAM, disks)
~280 knobs!
Expert DBA skips many steps…
Black magic!
Sub-optimal
or even far
from optimal
decisions
No real-workload
and real-data
verification
(or very limited
and affecting
production)
btree, hash, GiST, SP-GiST, GIN,
RUM, BRIN, Bloom;
unique, partial, functional, covering
Real-life examples
A real-life example. default_statistics_target: 100 vs 1000
The idea (from articles/blogs):
Default detault_statistics_target (100) is too low.
Let’s change it to 1000!
...Sounds good!
...Deployed!
But did it really improve everything? …anything?
But did it really improve everything? …anything?
Let’s check with Postgres.ai platform!
A real-life example. default_statistics_target: 100 vs 10002 years later we decided to make DB experiment to compare 100 and 1000:
A real-life example. default_statistics_target: 100 vs 1000
“before”:
default_statistics_target = 100
“after”:
default_statistics_target = 1000
A real-life example. default_statistics_target: 100 vs 1000
“before”:
default_statistics_target = 100
In general, the new value gives
better performance. Great!
“after”:
default_statistics_target = 1000
A real-life example. default_statistics_target: 100 vs 1000
“before”:
default_statistics_target = 100
Scroll down, a couple of query groups…
“after”:
default_statistics_target = 1000
A real-life example. default_statistics_target: 100 vs 1000
“before”:
default_statistics_target = 100
“after”:
default_statistics_target = 1000
One query group became much slower
after the change applied!
resolved with:
“ALTER TABLE/INDEX … ALTER COLUMN SET
STATISTICS …“
for particular table/index and column
Change management in other fields
Interfaces (GUI, CLI, API — any):
We have automated tests, CI/CD tools
GFDL and CC-BY-2.5, Wikipedia.org
One more example...
GFDL and CC-BY-2.5, Wikipedia.org
One more example...
Aviation, aeronautics, sport cars, etc:
wind tunnel
Experiments in other fields:
1. Are conducted in the special “lab” environment, not prod
(call it “staging”)
2. With very detailed, deep analysis
3. Highly automated. Multiple experimental runs, faster, less expensive
Part 2. Demo
Demo
Nancy CLI live demonstration (local + AWS)
Try it yourself (any feedback welcome): https://github.com/postgres-ai/nancy
Postgres.ai — GUI: “Create Experiment” form
Postgres.ai — GUI: “Create Experiment” form
Environment
Object
Workload
Delta
Part 3. Nancy CLI
What is a Database Experiment?
The Input:
1. Environment
hardware, software, OS, FS,
Postgres version, configuration
What is a Database Experiment?
The Input:
1. Environment
hardware, software, OS, FS,
Postgres version, configuration
2. Object
some database (e.g. cloned production)
What is a Database Experiment?
The Input:
1. Environment
hardware, software, OS, FS,
Postgres version, configuration
2. Object
some database (e.g. cloned production)
3. Workload
some SQL
What is a Database Experiment?
The Input:
1. Environment
hardware, software, OS, FS,
Postgres version, configuration
2. Object
some database (e.g. cloned production)
3. Workload
some SQL
4. Delta (optional; multiple values allowed)
some Postgres configuration change, or
new index
What is a Database Experiment?
The Input:
1. Environment
hardware, software, OS, FS,
Postgres version, configuration
2. Object
some database (e.g. cloned production)
3. Workload
some SQL
4. Delta (optional; multiple values allowed)
some Postgres configuration change, or
new index
The Output:
1. Summary
better or worse, in general?
What is a Database Experiment?
The Input:
1. Environment
hardware, software, OS, FS,
Postgres version, configuration
2. Object
some database (e.g. cloned production)
3. Workload
some SQL
4. Delta (optional; multiple values allowed)
some Postgres configuration change, or
new index
The Output:
1. Summary
better or worse, in general?
2. Artifacts
any useful “telemetry” data
What is a Database Experiment?
The Input:
1. Environment
hardware, software, OS, FS,
Postgres version, configuration
2. Object
some database (e.g. cloned production)
3. Workload
some SQL
4. Delta (optional; multiple values allowed)
some Postgres configuration change, or
new index
The Output:
1. Summary
better or worse, in general?
2. Artifacts
any useful “telemetry” data
3. Per-query deep SQL analysis
each query: better or worse?
What is a Database Experiment?
The Input:
1. Environment
hardware, software, OS, FS,
Postgres version, configuration
2. Object
some database (e.g. cloned production)
3. Workload
some SQL
4. Delta (optional; multiple values allowed)
some Postgres configuration change, or
new index
The Output:
1. Summary
better or worse, in general?
2. Artifacts
any useful “telemetry” data
3. Per-query deep SQL analysis
each query: better or worse?
+ histograms:
ms
query
group #
before
after
Is it possible with existing tools? Yes!
● Docker
● pgreplay
● pg_stat_***
● auto_explain
● pgBadger (with JSON output)
● AWS EC2 spot instances
All the blocks exist already!
Is it possible with existing tools? Yes!
● Docker
● pgreplay
● pg_stat_***
● auto_explain
● pgBadger (with JSON output)
● AWS EC2 spot instances
All the blocks exist already!
Nancy CLI: all these blocks (and more!), integrated and automated
https://github.com/postgres-ai/nancy
DIY automated pipeline for DB experiments and optimization
How to automate database optimization using ecosystem tools and AWS?
Analyze:
● pg_stat_statements
● auto_explan
● pgBadger to parse logs, use JSON output
● pg_query to group queries better
● pg_stat_kcache to analyze FS-level ops
Configuration:
● annotated.conf, pgtune, pgconfigurator, postgresqlco.nf
● ottertune
Suggested indexes (internal “what-if” API w/o actual execution)
● (useful: pgHero, POWA, HypoPG, dexter, plantuner)
Conduct experiments:
● pgreplay to replay logs (different log_line_prefix, you need to handle it)
● EC2 spot instances
Machine learning
● MADlib
pgBadger:
● Grouping queries can be implemented better (see pg_query)
● Makes all queries lower cased (hurts "camelCased" names)*
● Doesn’t really support plans (auto_explain)*
pgreplay and pgBadger are not friends,
require different log formats
*)
Fixed/improved in pgBadger 10.0
Postgres.ai — artificial DBA/DBRE assistants
AI-based cloud-friendly platform to automate database administration
54
Steve
AI-based expert in capacity planning and
database tuning
Joe
AI-based expert in query optimization and
Postgres indexes
Nancy
AI-based expert in database experiments.
Conducts experiments and presents
results to human and artificial DBAs
Sign up for early access:
https://Postgres.ai
Postgres.ai — artificial DBA/DBRE assistants
AI-based cloud-friendly platform to automate database administration
55
Steve
AI-based expert in capacity planning and
database tuning
Joe
AI-based expert in query optimization and
Postgres indexes
Nancy
AI-based expert in database experiments.
Conducts experiments and presents
results to human and artificial DBAs
Sign up for early access:
https://Postgres.ai
Meet Nancy CLI (open source)
Nancy CLI https://github.com/postgres-ai/nancy
● custom docker image (Postgres with extensions & tools)
● nancy prepare-workload to convert Postgres logs (now only .csv)
to workload binary file
● nancy run to run experiments
● able to run locally (any machine) on in EC2 spot instance (low price!),
including i3.*** instances (with NVMe)
● fully automated management of EC2 spots
What’s inside the docker container?
Source: https://github.com/postgres-ai/nancy/tree/master/docker
Image: https://hub.docker.com/r/postgresmen/postgres-with-stuff/
Inside:
● Ubuntu 16.04
● Postgres (now 9.6 or10)
● postgres_dba (for manual debugging) https://github.com/NikolayS/postgres_dba
● log_min_duration_statement = 0
● pg_stat_statements enabled, with track_io_timing = on
● auto_explain enabled (all queries, with timing)
● pgreplay
● pgBadger
● pg_stat_kcache (WIP)
Nancy CLI – Use Cases
- Deep performance analysis (“clean run”), not affecting production
- Change management
- Regression testing when upgrading (software, hardware)
- Verify config optimization ideas
- Find optimal configuration
- … and train ML models for Postgres AI!
Part 4. Real life challenges*
______________________________
*
based on experience in 4 companies
with small to mid-sized databases (up to few TBs)
and OLTP workloads (up to 15k TPS)
How to collect workload?
log_min_duration_statement = 0
Fears of log_min_duration_statement = 0
Evaluate the impact of log_min_duration_statement = 0 (quickly and w/o any changes!):
https://gist.github.com/NikolayS/08d9b7b4845371d03e195a8d8df43408 (pay attention to comments)
Only ~300 kB/s expected,
~800 log entries per second
However…
log_destination = syslog
logging_collector = off
or
log_destination = stderr
logging_collector = off
or
log_destination = csvlog
logging_collector = on
What is better
for intensive logging?
# Postgres 9.6.10
pgbench -U postgres -j2 -c24 -T60 -rnf - <<EOF
select;
EOF
# Postgres 9.6.10
pgbench -U postgres -j2 -c24 -T60 -rnf - <<EOF
select;
EOF
# Postgres 9.6.10
pgbench -U postgres -j2 -c24 -T60 -rnf - <<EOF
select;
EOF
# Postgres 9.6.10
pgbench -U postgres -j2 -c24 -T60 -rnf - <<EOF
select;
EOF
All-queries logging with syslog is
● 44x slower compared to “no logging”
● 33x slower compared to stderr /
logging collector
Be careful with syslog / journald
● Forecast expected IOPS and write bandwidth
● Analyze your logging system (syslog/journald slows down everything, use
STDERR, logging collector)
● If forecasted impact is too high (say, dozens of MBs), consider sampling
( "SET log_min_duration_statement = 0;" in particular, random sessions)
Advices for log_min_duration_statement = 0
Nancy real-world examples: shared_buffers
postila.ru,
Real workload (5 min),
61 GB RAM (ec2 i3.2xlarge),
DB size: ~500GB
Various shared_buffers
values
shared_buffers = 15GB (25%)
If we go from 25% o higher values (~80%), we improve SQL performance ~50%
Nancy real-world examples: seq_page_cost
GitLab.com: random_page_cost = 1.5 and seq_page_cost = 1
Decision was made to switch to seq_page_cost = 4
DB experiments with Nancy CLI were made to check was this decision good in
terms of performance.
Results:
https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/4835#note_106669373
– in general, SQL performance improved ~40%
WIP here, it is an open question, why is it so.
Nancy real-world examples: educate yourself
PostgreSQL Documentation “19.5. Write Ahead Log”
https://www.postgresql.org/docs/current/static/runtime-config-wal.html
Nancy real-world examples: educate yourself
PostgreSQL Documentation “19.5. Write Ahead Log”
https://www.postgresql.org/docs/current/static/runtime-config-wal.html
Just conduct DB experiment with Nancy CLI,
use --keep-alive 3600 and compare!
Thank you!
Nikolay Samokhvalov
ns@postgres.ai
twitter: @postgresmen
73
Give it a try:
https://github.com/postgres-ai/nancy
Part 5. Future development
Various ways to create an experimental database
● plain text pg_dump
○ restoration is very slow (1 vcpu utilized)
○ “logical” – physical structure is lost (cannot experiment with bloat, etc)
○ small (if compressed)
○ “snapshot” only
● pg_dump with either -Fd (“directory”) or -Fc (“custom”):
○ restoration is faster (multiple vCPUs, -j option)
○ “logical” (again: bloat, physical layout is “lost”)
○ small (because compressed)
○ “snapshot” only
● pg_basebackup + WALs, point-in-time recovery (PITR), possibly with help from WAL-E, WAL-G, pgBackRest
○ less reliable, sometimes there issues (especially if 3rd party tools involved - e.g. WAL-E & WAL-G don’t support
tablespaces, there are bugs sometimes, etc)
○ “physical”: bloat and physical structure is preserved
○ not small – ~ size of the DB
○ can “walk in time” (PITR)
○ requires warm-up procedure (data is not in the memory!)
● AWS RDS: create a replica + promote it
○ no Spots :-/
○ Lazy Load is tricky (it looks like the DB is there but it’s very slow – warm-up is needed)
● Snapshots
● Ideas for serialization
○ Stop Postgres / rsync “back” or re-copy locally on NVMe / start Postgres
How can we speed up experimental runs?
● Prepare the EC2 instance(s) in advance and keep it
● Prepare EBS volume(s) only (perhaps, using an instance of the different
type) and keep it ready. When attached to the new instance, do warm-up
● Resource re-usage:
○ reuse docker container
○ reuse EC2 instance
○ serialize experimental runs serialization (DDL Do/Undo; VACUUM FULL; cleanup)
● Partial database snapshots (dump/restore only needed tables)
The future development of Nancy CLI
● Speedup DB creation
● ZFS/XFS snapshots to revert PGDATA state within seconds
● Support GCP
● More artifacts delivered: pg_stat_kcache, etc
● nancy see-report to print the summary + top-30 queries
● nancy compare-reports to print the “diff” for 2+ reports (the summary + numbers for top-30 queries,
ordered by by total time based on the 1st report)
● Postgres 11
● pgbench -i for database initialization
● pgbench to generate multithreaded synthetic workload
● Workload analysis: automatically detect “N+1 SELECT” when running workload
● Better support for the serialization of experimental runs
● Better support for multiple runs https://github.com/postgres-ai/nancy/pull/97
○ interval with step
○ gradient descent
● Provide costs estimation (time + money)
● Rewrite in Python or Go
Feedback/contributions welcome
https://github.com/postgres-ai/nancy
Challenge: security issues
Problem: a developer doesn’t have access to production.
Nancy works with production data/workload.
What about permissions and data protection?
Challenge: security issues
Problem: a developer doesn’t have access to production.
Nancy works with production data/workload.
What about permissions and data protection?
Possible solutions:
● Option 1: allow using Nancy CLI only to those who already have access
production (DBAs, SREs, DBREs)
● Option 2: obfuscate data when preparing a DB clone (no universal
solution yet, TODO)
● Option 3: allow access only to GUI, hide/obfuscate parameters (TODO)
Challenge: reliable results
Issues:
1. Single runs is not enough (fluctuations) – must repeat
2. “Before”/”after” runs on 2 different machines / EC2 nodes – “not fair” comparison
(defective hardware, throttling)
Solutions (ideally: combination of them):
● Sequential runs
● 4+ iterations of each experimental run
● “Baseline benchmark” https://github.com/postgres-ai/nancy/issues/94

Weitere ähnliche Inhalte

Was ist angesagt?

How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015PostgreSQL-Consulting
 
PostgreSQL Extensions: A deeper look
PostgreSQL Extensions:  A deeper lookPostgreSQL Extensions:  A deeper look
PostgreSQL Extensions: A deeper lookJignesh Shah
 
Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0Databricks
 
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재[pgday.Seoul 2022] PostgreSQL구조 - 윤성재
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재PgDay.Seoul
 
Using PostgreSQL for Data Privacy
Using PostgreSQL for Data PrivacyUsing PostgreSQL for Data Privacy
Using PostgreSQL for Data PrivacyMason Sharp
 
[pgday.Seoul 2022] 서비스개편시 PostgreSQL 도입기 - 진소린 & 김태정
[pgday.Seoul 2022] 서비스개편시 PostgreSQL 도입기 - 진소린 & 김태정[pgday.Seoul 2022] 서비스개편시 PostgreSQL 도입기 - 진소린 & 김태정
[pgday.Seoul 2022] 서비스개편시 PostgreSQL 도입기 - 진소린 & 김태정PgDay.Seoul
 
PostgreSQL Deep Internal
PostgreSQL Deep InternalPostgreSQL Deep Internal
PostgreSQL Deep InternalEXEM
 
Postgresql database administration volume 1
Postgresql database administration volume 1Postgresql database administration volume 1
Postgresql database administration volume 1Federico Campoli
 
Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDatabricks
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL AdministrationEDB
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsDatabricks
 
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteCost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteJulian Hyde
 
Advanced Postgres Monitoring
Advanced Postgres MonitoringAdvanced Postgres Monitoring
Advanced Postgres MonitoringDenish Patel
 
patroni-based citrus high availability environment deployment
patroni-based citrus high availability environment deploymentpatroni-based citrus high availability environment deployment
patroni-based citrus high availability environment deploymenthyeongchae lee
 
SQL Performance Improvements at a Glance in Apache Spark 3.0
SQL Performance Improvements at a Glance in Apache Spark 3.0SQL Performance Improvements at a Glance in Apache Spark 3.0
SQL Performance Improvements at a Glance in Apache Spark 3.0Databricks
 
PostgreSQL Materialized Views with Active Record
PostgreSQL Materialized Views with Active RecordPostgreSQL Materialized Views with Active Record
PostgreSQL Materialized Views with Active RecordDavid Roberts
 
[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)NAVER D2
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesDatabricks
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama
 

Was ist angesagt? (20)

How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
 
PostgreSQL Extensions: A deeper look
PostgreSQL Extensions:  A deeper lookPostgreSQL Extensions:  A deeper look
PostgreSQL Extensions: A deeper look
 
Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0
 
5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance
 
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재[pgday.Seoul 2022] PostgreSQL구조 - 윤성재
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재
 
Using PostgreSQL for Data Privacy
Using PostgreSQL for Data PrivacyUsing PostgreSQL for Data Privacy
Using PostgreSQL for Data Privacy
 
[pgday.Seoul 2022] 서비스개편시 PostgreSQL 도입기 - 진소린 & 김태정
[pgday.Seoul 2022] 서비스개편시 PostgreSQL 도입기 - 진소린 & 김태정[pgday.Seoul 2022] 서비스개편시 PostgreSQL 도입기 - 진소린 & 김태정
[pgday.Seoul 2022] 서비스개편시 PostgreSQL 도입기 - 진소린 & 김태정
 
PostgreSQL Deep Internal
PostgreSQL Deep InternalPostgreSQL Deep Internal
PostgreSQL Deep Internal
 
Postgresql database administration volume 1
Postgresql database administration volume 1Postgresql database administration volume 1
Postgresql database administration volume 1
 
Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache Spark
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL Administration
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark Metrics
 
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteCost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
 
Advanced Postgres Monitoring
Advanced Postgres MonitoringAdvanced Postgres Monitoring
Advanced Postgres Monitoring
 
patroni-based citrus high availability environment deployment
patroni-based citrus high availability environment deploymentpatroni-based citrus high availability environment deployment
patroni-based citrus high availability environment deployment
 
SQL Performance Improvements at a Glance in Apache Spark 3.0
SQL Performance Improvements at a Glance in Apache Spark 3.0SQL Performance Improvements at a Glance in Apache Spark 3.0
SQL Performance Improvements at a Glance in Apache Spark 3.0
 
PostgreSQL Materialized Views with Active Record
PostgreSQL Materialized Views with Active RecordPostgreSQL Materialized Views with Active Record
PostgreSQL Materialized Views with Active Record
 
[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 

Ähnlich wie The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose

Nancy CLI. Automated Database Experiments
Nancy CLI. Automated Database ExperimentsNancy CLI. Automated Database Experiments
Nancy CLI. Automated Database ExperimentsNikolay Samokhvalov
 
How Many Slaves (Ukoug)
How Many Slaves (Ukoug)How Many Slaves (Ukoug)
How Many Slaves (Ukoug)Doug Burns
 
Building a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management ApplicationBuilding a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management ApplicationJonathan Katz
 
Planning for-high-performance-web-application
Planning for-high-performance-web-applicationPlanning for-high-performance-web-application
Planning for-high-performance-web-applicationNguyễn Duy Nhân
 
Lessons Learned While Scaling Elasticsearch at Vinted
Lessons Learned While Scaling Elasticsearch at VintedLessons Learned While Scaling Elasticsearch at Vinted
Lessons Learned While Scaling Elasticsearch at VintedDainius Jocas
 
Planning For High Performance Web Application
Planning For High Performance Web ApplicationPlanning For High Performance Web Application
Planning For High Performance Web ApplicationYue Tian
 
Performance Tuning and Optimization
Performance Tuning and OptimizationPerformance Tuning and Optimization
Performance Tuning and OptimizationMongoDB
 
Performance Tuning Cheat Sheet for MongoDB
Performance Tuning Cheat Sheet for MongoDBPerformance Tuning Cheat Sheet for MongoDB
Performance Tuning Cheat Sheet for MongoDBSeveralnines
 
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...Codemotion
 
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...Demi Ben-Ari
 
PostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 Taipei
PostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 TaipeiPostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 Taipei
PostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 TaipeiSatoshi Nagayasu
 
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Introduction to DataFusion  An Embeddable Query Engine Written in RustIntroduction to DataFusion  An Embeddable Query Engine Written in Rust
Introduction to DataFusion An Embeddable Query Engine Written in RustAndrew Lamb
 
Oracle to PostgreSQL, Challenges to Opportunity.pdf
Oracle to PostgreSQL, Challenges to Opportunity.pdfOracle to PostgreSQL, Challenges to Opportunity.pdf
Oracle to PostgreSQL, Challenges to Opportunity.pdfEqunix Business Solutions
 
PGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien Rouhaud
PGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien RouhaudPGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien Rouhaud
PGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien RouhaudEqunix Business Solutions
 
Lessons learned from designing QA automation event streaming platform(IoT big...
Lessons learned from designing QA automation event streaming platform(IoT big...Lessons learned from designing QA automation event streaming platform(IoT big...
Lessons learned from designing QA automation event streaming platform(IoT big...Omid Vahdaty
 
Hannes end-of-the-router-tnc17
Hannes end-of-the-router-tnc17Hannes end-of-the-router-tnc17
Hannes end-of-the-router-tnc17Hannes Gredler
 
Webinar: Performance Tuning + Optimization
Webinar: Performance Tuning + OptimizationWebinar: Performance Tuning + Optimization
Webinar: Performance Tuning + OptimizationMongoDB
 
MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...
MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...
MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...ronwarshawsky
 
Drilling Cyber Security Data With Apache Drill
Drilling Cyber Security Data With Apache DrillDrilling Cyber Security Data With Apache Drill
Drilling Cyber Security Data With Apache DrillCharles Givre
 

Ähnlich wie The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose (20)

Nancy CLI. Automated Database Experiments
Nancy CLI. Automated Database ExperimentsNancy CLI. Automated Database Experiments
Nancy CLI. Automated Database Experiments
 
How Many Slaves (Ukoug)
How Many Slaves (Ukoug)How Many Slaves (Ukoug)
How Many Slaves (Ukoug)
 
Building a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management ApplicationBuilding a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management Application
 
Planning for-high-performance-web-application
Planning for-high-performance-web-applicationPlanning for-high-performance-web-application
Planning for-high-performance-web-application
 
Lessons Learned While Scaling Elasticsearch at Vinted
Lessons Learned While Scaling Elasticsearch at VintedLessons Learned While Scaling Elasticsearch at Vinted
Lessons Learned While Scaling Elasticsearch at Vinted
 
Prestogres internals
Prestogres internalsPrestogres internals
Prestogres internals
 
Planning For High Performance Web Application
Planning For High Performance Web ApplicationPlanning For High Performance Web Application
Planning For High Performance Web Application
 
Performance Tuning and Optimization
Performance Tuning and OptimizationPerformance Tuning and Optimization
Performance Tuning and Optimization
 
Performance Tuning Cheat Sheet for MongoDB
Performance Tuning Cheat Sheet for MongoDBPerformance Tuning Cheat Sheet for MongoDB
Performance Tuning Cheat Sheet for MongoDB
 
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
 
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
 
PostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 Taipei
PostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 TaipeiPostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 Taipei
PostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 Taipei
 
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Introduction to DataFusion  An Embeddable Query Engine Written in RustIntroduction to DataFusion  An Embeddable Query Engine Written in Rust
Introduction to DataFusion An Embeddable Query Engine Written in Rust
 
Oracle to PostgreSQL, Challenges to Opportunity.pdf
Oracle to PostgreSQL, Challenges to Opportunity.pdfOracle to PostgreSQL, Challenges to Opportunity.pdf
Oracle to PostgreSQL, Challenges to Opportunity.pdf
 
PGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien Rouhaud
PGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien RouhaudPGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien Rouhaud
PGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien Rouhaud
 
Lessons learned from designing QA automation event streaming platform(IoT big...
Lessons learned from designing QA automation event streaming platform(IoT big...Lessons learned from designing QA automation event streaming platform(IoT big...
Lessons learned from designing QA automation event streaming platform(IoT big...
 
Hannes end-of-the-router-tnc17
Hannes end-of-the-router-tnc17Hannes end-of-the-router-tnc17
Hannes end-of-the-router-tnc17
 
Webinar: Performance Tuning + Optimization
Webinar: Performance Tuning + OptimizationWebinar: Performance Tuning + Optimization
Webinar: Performance Tuning + Optimization
 
MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...
MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...
MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...
 
Drilling Cyber Security Data With Apache Drill
Drilling Cyber Security Data With Apache DrillDrilling Cyber Security Data With Apache Drill
Drilling Cyber Security Data With Apache Drill
 

Mehr von Nikolay Samokhvalov

Эксперименты с Postgres в Docker и облаках — оптимизация настроек и схемы ва...
 Эксперименты с Postgres в Docker и облаках — оптимизация настроек и схемы ва... Эксперименты с Postgres в Docker и облаках — оптимизация настроек и схемы ва...
Эксперименты с Postgres в Docker и облаках — оптимизация настроек и схемы ва...Nikolay Samokhvalov
 
Промышленный подход к тюнингу PostgreSQL: эксперименты над базами данных
Промышленный подход к тюнингу PostgreSQL: эксперименты над базами данныхПромышленный подход к тюнингу PostgreSQL: эксперименты над базами данных
Промышленный подход к тюнингу PostgreSQL: эксперименты над базами данныхNikolay Samokhvalov
 
#RuPostgresLive 4: как писать и читать сложные SQL-запросы
#RuPostgresLive 4: как писать и читать сложные SQL-запросы#RuPostgresLive 4: как писать и читать сложные SQL-запросы
#RuPostgresLive 4: как писать и читать сложные SQL-запросыNikolay Samokhvalov
 
#RuPostgresLive 4: как писать и читать сложные SQL-запросы
#RuPostgresLive 4: как писать и читать сложные SQL-запросы#RuPostgresLive 4: как писать и читать сложные SQL-запросы
#RuPostgresLive 4: как писать и читать сложные SQL-запросыNikolay Samokhvalov
 
Database First! О распространённых ошибках использования РСУБД
Database First! О распространённых ошибках использования РСУБДDatabase First! О распространённых ошибках использования РСУБД
Database First! О распространённых ошибках использования РСУБДNikolay Samokhvalov
 
#RuPostges в Yandex, эпизод 3. Что же нового в PostgreSQL 9.6
#RuPostges в Yandex, эпизод 3. Что же нового в PostgreSQL 9.6#RuPostges в Yandex, эпизод 3. Что же нового в PostgreSQL 9.6
#RuPostges в Yandex, эпизод 3. Что же нового в PostgreSQL 9.6Nikolay Samokhvalov
 
#noBackend, или Как выжить в эпоху толстеющих клиентов
#noBackend, или Как выжить в эпоху толстеющих клиентов#noBackend, или Как выжить в эпоху толстеющих клиентов
#noBackend, или Как выжить в эпоху толстеющих клиентовNikolay Samokhvalov
 
#PostgreSQLRussia в банке Тинькофф, доклад №1
#PostgreSQLRussia в банке Тинькофф, доклад №1#PostgreSQLRussia в банке Тинькофф, доклад №1
#PostgreSQLRussia в банке Тинькофф, доклад №1Nikolay Samokhvalov
 
SFPUG 2015.11.20 lightning talk "PostgreSQL in Russia"
SFPUG 2015.11.20 lightning talk "PostgreSQL in Russia"SFPUG 2015.11.20 lightning talk "PostgreSQL in Russia"
SFPUG 2015.11.20 lightning talk "PostgreSQL in Russia"Nikolay Samokhvalov
 
Владимир Бородин: Как спать спокойно - 2015.10.14 PostgreSQLRussia.org meetu...
Владимир Бородин: Как спать спокойно - 2015.10.14 PostgreSQLRussia.org meetu...Владимир Бородин: Как спать спокойно - 2015.10.14 PostgreSQLRussia.org meetu...
Владимир Бородин: Как спать спокойно - 2015.10.14 PostgreSQLRussia.org meetu...Nikolay Samokhvalov
 
#PostgreSQLRussia 2015.09.15 - Николай Самохвалов - 5 главных особенностей Po...
#PostgreSQLRussia 2015.09.15 - Николай Самохвалов - 5 главных особенностей Po...#PostgreSQLRussia 2015.09.15 - Николай Самохвалов - 5 главных особенностей Po...
#PostgreSQLRussia 2015.09.15 - Николай Самохвалов - 5 главных особенностей Po...Nikolay Samokhvalov
 
#PostgreSQLRussia 2015.09.15 - Максим Трегубов, CUSTIS - Миграция из Oracle в...
#PostgreSQLRussia 2015.09.15 - Максим Трегубов, CUSTIS - Миграция из Oracle в...#PostgreSQLRussia 2015.09.15 - Максим Трегубов, CUSTIS - Миграция из Oracle в...
#PostgreSQLRussia 2015.09.15 - Максим Трегубов, CUSTIS - Миграция из Oracle в...Nikolay Samokhvalov
 
Три вызова реляционным СУБД и новый PostgreSQL - #PostgreSQLRussia семинар по...
Три вызова реляционным СУБД и новый PostgreSQL - #PostgreSQLRussia семинар по...Три вызова реляционным СУБД и новый PostgreSQL - #PostgreSQLRussia семинар по...
Три вызова реляционным СУБД и новый PostgreSQL - #PostgreSQLRussia семинар по...Nikolay Samokhvalov
 
2014.12.23 Николай Самохвалов, Ещё раз о JSON(b) в PostgreSQL 9.4
2014.12.23 Николай Самохвалов, Ещё раз о JSON(b) в PostgreSQL 9.42014.12.23 Николай Самохвалов, Ещё раз о JSON(b) в PostgreSQL 9.4
2014.12.23 Николай Самохвалов, Ещё раз о JSON(b) в PostgreSQL 9.4Nikolay Samokhvalov
 
2014.12.23 Александр Андреев, Parallels
2014.12.23 Александр Андреев, Parallels2014.12.23 Александр Андреев, Parallels
2014.12.23 Александр Андреев, ParallelsNikolay Samokhvalov
 
2014.10.15 Сергей Бурладян, Avito.ru
2014.10.15 Сергей Бурладян, Avito.ru2014.10.15 Сергей Бурладян, Avito.ru
2014.10.15 Сергей Бурладян, Avito.ruNikolay Samokhvalov
 
2014.10.15 Мурат Кабилов, Avito.ru #PostgreSQLRussia
2014.10.15 Мурат Кабилов, Avito.ru #PostgreSQLRussia2014.10.15 Мурат Кабилов, Avito.ru #PostgreSQLRussia
2014.10.15 Мурат Кабилов, Avito.ru #PostgreSQLRussiaNikolay Samokhvalov
 
2014.10.15 блиц-доклад PostgreSQL kNN search
2014.10.15 блиц-доклад PostgreSQL kNN search2014.10.15 блиц-доклад PostgreSQL kNN search
2014.10.15 блиц-доклад PostgreSQL kNN searchNikolay Samokhvalov
 
2014.09.24 история небольшого успеха с PostgreSQL (Yandex)
2014.09.24 история небольшого успеха с PostgreSQL (Yandex)2014.09.24 история небольшого успеха с PostgreSQL (Yandex)
2014.09.24 история небольшого успеха с PostgreSQL (Yandex)Nikolay Samokhvalov
 

Mehr von Nikolay Samokhvalov (20)

Эксперименты с Postgres в Docker и облаках — оптимизация настроек и схемы ва...
 Эксперименты с Postgres в Docker и облаках — оптимизация настроек и схемы ва... Эксперименты с Postgres в Docker и облаках — оптимизация настроек и схемы ва...
Эксперименты с Postgres в Docker и облаках — оптимизация настроек и схемы ва...
 
Промышленный подход к тюнингу PostgreSQL: эксперименты над базами данных
Промышленный подход к тюнингу PostgreSQL: эксперименты над базами данныхПромышленный подход к тюнингу PostgreSQL: эксперименты над базами данных
Промышленный подход к тюнингу PostgreSQL: эксперименты над базами данных
 
#RuPostgresLive 4: как писать и читать сложные SQL-запросы
#RuPostgresLive 4: как писать и читать сложные SQL-запросы#RuPostgresLive 4: как писать и читать сложные SQL-запросы
#RuPostgresLive 4: как писать и читать сложные SQL-запросы
 
#RuPostgresLive 4: как писать и читать сложные SQL-запросы
#RuPostgresLive 4: как писать и читать сложные SQL-запросы#RuPostgresLive 4: как писать и читать сложные SQL-запросы
#RuPostgresLive 4: как писать и читать сложные SQL-запросы
 
Database First! О распространённых ошибках использования РСУБД
Database First! О распространённых ошибках использования РСУБДDatabase First! О распространённых ошибках использования РСУБД
Database First! О распространённых ошибках использования РСУБД
 
2016.10.13 PostgreSQL in Russia
2016.10.13 PostgreSQL in Russia2016.10.13 PostgreSQL in Russia
2016.10.13 PostgreSQL in Russia
 
#RuPostges в Yandex, эпизод 3. Что же нового в PostgreSQL 9.6
#RuPostges в Yandex, эпизод 3. Что же нового в PostgreSQL 9.6#RuPostges в Yandex, эпизод 3. Что же нового в PostgreSQL 9.6
#RuPostges в Yandex, эпизод 3. Что же нового в PostgreSQL 9.6
 
#noBackend, или Как выжить в эпоху толстеющих клиентов
#noBackend, или Как выжить в эпоху толстеющих клиентов#noBackend, или Как выжить в эпоху толстеющих клиентов
#noBackend, или Как выжить в эпоху толстеющих клиентов
 
#PostgreSQLRussia в банке Тинькофф, доклад №1
#PostgreSQLRussia в банке Тинькофф, доклад №1#PostgreSQLRussia в банке Тинькофф, доклад №1
#PostgreSQLRussia в банке Тинькофф, доклад №1
 
SFPUG 2015.11.20 lightning talk "PostgreSQL in Russia"
SFPUG 2015.11.20 lightning talk "PostgreSQL in Russia"SFPUG 2015.11.20 lightning talk "PostgreSQL in Russia"
SFPUG 2015.11.20 lightning talk "PostgreSQL in Russia"
 
Владимир Бородин: Как спать спокойно - 2015.10.14 PostgreSQLRussia.org meetu...
Владимир Бородин: Как спать спокойно - 2015.10.14 PostgreSQLRussia.org meetu...Владимир Бородин: Как спать спокойно - 2015.10.14 PostgreSQLRussia.org meetu...
Владимир Бородин: Как спать спокойно - 2015.10.14 PostgreSQLRussia.org meetu...
 
#PostgreSQLRussia 2015.09.15 - Николай Самохвалов - 5 главных особенностей Po...
#PostgreSQLRussia 2015.09.15 - Николай Самохвалов - 5 главных особенностей Po...#PostgreSQLRussia 2015.09.15 - Николай Самохвалов - 5 главных особенностей Po...
#PostgreSQLRussia 2015.09.15 - Николай Самохвалов - 5 главных особенностей Po...
 
#PostgreSQLRussia 2015.09.15 - Максим Трегубов, CUSTIS - Миграция из Oracle в...
#PostgreSQLRussia 2015.09.15 - Максим Трегубов, CUSTIS - Миграция из Oracle в...#PostgreSQLRussia 2015.09.15 - Максим Трегубов, CUSTIS - Миграция из Oracle в...
#PostgreSQLRussia 2015.09.15 - Максим Трегубов, CUSTIS - Миграция из Oracle в...
 
Три вызова реляционным СУБД и новый PostgreSQL - #PostgreSQLRussia семинар по...
Три вызова реляционным СУБД и новый PostgreSQL - #PostgreSQLRussia семинар по...Три вызова реляционным СУБД и новый PostgreSQL - #PostgreSQLRussia семинар по...
Три вызова реляционным СУБД и новый PostgreSQL - #PostgreSQLRussia семинар по...
 
2014.12.23 Николай Самохвалов, Ещё раз о JSON(b) в PostgreSQL 9.4
2014.12.23 Николай Самохвалов, Ещё раз о JSON(b) в PostgreSQL 9.42014.12.23 Николай Самохвалов, Ещё раз о JSON(b) в PostgreSQL 9.4
2014.12.23 Николай Самохвалов, Ещё раз о JSON(b) в PostgreSQL 9.4
 
2014.12.23 Александр Андреев, Parallels
2014.12.23 Александр Андреев, Parallels2014.12.23 Александр Андреев, Parallels
2014.12.23 Александр Андреев, Parallels
 
2014.10.15 Сергей Бурладян, Avito.ru
2014.10.15 Сергей Бурладян, Avito.ru2014.10.15 Сергей Бурладян, Avito.ru
2014.10.15 Сергей Бурладян, Avito.ru
 
2014.10.15 Мурат Кабилов, Avito.ru #PostgreSQLRussia
2014.10.15 Мурат Кабилов, Avito.ru #PostgreSQLRussia2014.10.15 Мурат Кабилов, Avito.ru #PostgreSQLRussia
2014.10.15 Мурат Кабилов, Avito.ru #PostgreSQLRussia
 
2014.10.15 блиц-доклад PostgreSQL kNN search
2014.10.15 блиц-доклад PostgreSQL kNN search2014.10.15 блиц-доклад PostgreSQL kNN search
2014.10.15 блиц-доклад PostgreSQL kNN search
 
2014.09.24 история небольшого успеха с PostgreSQL (Yandex)
2014.09.24 история небольшого успеха с PostgreSQL (Yandex)2014.09.24 история небольшого успеха с PostgreSQL (Yandex)
2014.09.24 история небольшого успеха с PostgreSQL (Yandex)
 

Kürzlich hochgeladen

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 

Kürzlich hochgeladen (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose

  • 1. The Art of Database Experiments Postgres.ai Nikolay Samokhvalov email: ns@postgres.ai twitter: @postgresmen
  • 2. About meAbout me: Postgres experience: 13+ years (databases: 17+) Past: Co-founder/CTO of 3 startups (total 30M+ users), all based on Postgres Founder of #RuPostgres (1700+ members on Meetup.com, the 2nd largest globally) Re-launched consulting practice in the SF Bay Area PostgreSQL.support Founder of Postgres.ai – a platform aimed to automate what is not yet automated Twitter: @postgresmen Email: ns@postgres.ai
  • 3. Part 1. Why database experiments?
  • 4. How do we find SQL performance issues?
  • 8. prod monitoring Engineer Automated signals (alerts), manual observations The most common tools for general SQL performance analysis:
  • 9. prod monitoring Engineer ● pg_stat_statements ○ no query examples ○ no query plans Automated signals (alerts), manual observations The most common tools for general SQL performance analysis:
  • 10. prod monitoring Engineer ● pg_stat_statements ○ no query examples ○ no query plans ● log analysis (pgBadger) ○ requires maintenance ○ not “live” ○ no query plans (unless auto_explain is ON) ○ usually, not full picture (log_min_duration_statement >> 0) Automated signals (alerts), manual observations The most common tools for general SQL performance analysis:
  • 11. How do we solve the issues we found?
  • 13. dev/staging m anual checks manual checks Four ways to improve performance: prod monitoring Engineer
  • 14. dev/staging m anual checks manual checks Four ways to improve performance: 1. Tune Postgres configuration prod monitoring Engineer
  • 15. dev/staging m anual checks manual checks Four ways to improve performance: 1. Tune Postgres configuration 2. Add/remove indexes prod monitoring Engineer
  • 16. dev/staging m anual checks manual checks Four ways to improve performance: 1. Tune Postgres configuration 2. Add/remove indexes 3. Rewrite query / redesign DB schema prod monitoring Engineer
  • 17. dev/staging m anual checks manual checks Four ways to improve performance: 1. Tune Postgres configuration 2. Add/remove indexes 3. Rewrite query / redesign DB schema 4. Add more resources (CPU, RAM, disks) prod monitoring Engineer
  • 18. Four ways to improve performance: 1. Tune Postgres configuration 2. Add/remove indexes 3. Rewrite query / redesign DB schema 4. Add more resources (CPU, RAM, disks)
  • 19. Four ways to improve performance: 1. Tune Postgres configuration 2. Add/remove indexes 3. Rewrite query / redesign DB schema 4. Add more resources (CPU, RAM, disks) ~280 knobs!
  • 20. Four ways to improve performance: 1. Tune Postgres configuration 2. Add/remove indexes 3. Rewrite query / redesign DB schema 4. Add more resources (CPU, RAM, disks) ~280 knobs! btree, hash, GiST, SP-GiST, GIN, RUM, BRIN, Bloom; unique, partial, functional, covering
  • 21. Four ways to improve performance: 1. Tune Postgres configuration 2. Add/remove indexes 3. Rewrite query / redesign DB schema 4. Add more resources (CPU, RAM, disks) ~280 knobs! No real-workload and real-data verification (or very limited and affecting production) btree, hash, GiST, SP-GiST, GIN, RUM, BRIN, Bloom; unique, partial, functional, covering
  • 22. Four ways to improve performance: 1. Tune Postgres configuration 2. Add/remove indexes 3. Rewrite query / redesign DB schema 4. Add more resources (CPU, RAM, disks) ~280 knobs! Sub-optimal or even far from optimal decisions No real-workload and real-data verification (or very limited and affecting production) btree, hash, GiST, SP-GiST, GIN, RUM, BRIN, Bloom; unique, partial, functional, covering
  • 23. Four ways to improve performance: 1. Tune Postgres configuration 2. Add/remove indexes 3. Rewrite query / redesign DB schema 4. Add more resources (CPU, RAM, disks) ~280 knobs! Expert DBA skips many steps… Black magic! Sub-optimal or even far from optimal decisions No real-workload and real-data verification (or very limited and affecting production) btree, hash, GiST, SP-GiST, GIN, RUM, BRIN, Bloom; unique, partial, functional, covering
  • 25. A real-life example. default_statistics_target: 100 vs 1000 The idea (from articles/blogs): Default detault_statistics_target (100) is too low. Let’s change it to 1000! ...Sounds good! ...Deployed!
  • 26. But did it really improve everything? …anything?
  • 27. But did it really improve everything? …anything? Let’s check with Postgres.ai platform!
  • 28. A real-life example. default_statistics_target: 100 vs 10002 years later we decided to make DB experiment to compare 100 and 1000:
  • 29. A real-life example. default_statistics_target: 100 vs 1000 “before”: default_statistics_target = 100 “after”: default_statistics_target = 1000
  • 30. A real-life example. default_statistics_target: 100 vs 1000 “before”: default_statistics_target = 100 In general, the new value gives better performance. Great! “after”: default_statistics_target = 1000
  • 31. A real-life example. default_statistics_target: 100 vs 1000 “before”: default_statistics_target = 100 Scroll down, a couple of query groups… “after”: default_statistics_target = 1000
  • 32. A real-life example. default_statistics_target: 100 vs 1000 “before”: default_statistics_target = 100 “after”: default_statistics_target = 1000 One query group became much slower after the change applied! resolved with: “ALTER TABLE/INDEX … ALTER COLUMN SET STATISTICS …“ for particular table/index and column
  • 33. Change management in other fields
  • 34. Interfaces (GUI, CLI, API — any): We have automated tests, CI/CD tools
  • 35. GFDL and CC-BY-2.5, Wikipedia.org One more example...
  • 36. GFDL and CC-BY-2.5, Wikipedia.org One more example... Aviation, aeronautics, sport cars, etc: wind tunnel
  • 37. Experiments in other fields: 1. Are conducted in the special “lab” environment, not prod (call it “staging”) 2. With very detailed, deep analysis 3. Highly automated. Multiple experimental runs, faster, less expensive
  • 39. Demo Nancy CLI live demonstration (local + AWS) Try it yourself (any feedback welcome): https://github.com/postgres-ai/nancy
  • 40. Postgres.ai — GUI: “Create Experiment” form
  • 41. Postgres.ai — GUI: “Create Experiment” form Environment Object Workload Delta
  • 43. What is a Database Experiment? The Input: 1. Environment hardware, software, OS, FS, Postgres version, configuration
  • 44. What is a Database Experiment? The Input: 1. Environment hardware, software, OS, FS, Postgres version, configuration 2. Object some database (e.g. cloned production)
  • 45. What is a Database Experiment? The Input: 1. Environment hardware, software, OS, FS, Postgres version, configuration 2. Object some database (e.g. cloned production) 3. Workload some SQL
  • 46. What is a Database Experiment? The Input: 1. Environment hardware, software, OS, FS, Postgres version, configuration 2. Object some database (e.g. cloned production) 3. Workload some SQL 4. Delta (optional; multiple values allowed) some Postgres configuration change, or new index
  • 47. What is a Database Experiment? The Input: 1. Environment hardware, software, OS, FS, Postgres version, configuration 2. Object some database (e.g. cloned production) 3. Workload some SQL 4. Delta (optional; multiple values allowed) some Postgres configuration change, or new index The Output: 1. Summary better or worse, in general?
  • 48. What is a Database Experiment? The Input: 1. Environment hardware, software, OS, FS, Postgres version, configuration 2. Object some database (e.g. cloned production) 3. Workload some SQL 4. Delta (optional; multiple values allowed) some Postgres configuration change, or new index The Output: 1. Summary better or worse, in general? 2. Artifacts any useful “telemetry” data
  • 49. What is a Database Experiment? The Input: 1. Environment hardware, software, OS, FS, Postgres version, configuration 2. Object some database (e.g. cloned production) 3. Workload some SQL 4. Delta (optional; multiple values allowed) some Postgres configuration change, or new index The Output: 1. Summary better or worse, in general? 2. Artifacts any useful “telemetry” data 3. Per-query deep SQL analysis each query: better or worse?
  • 50. What is a Database Experiment? The Input: 1. Environment hardware, software, OS, FS, Postgres version, configuration 2. Object some database (e.g. cloned production) 3. Workload some SQL 4. Delta (optional; multiple values allowed) some Postgres configuration change, or new index The Output: 1. Summary better or worse, in general? 2. Artifacts any useful “telemetry” data 3. Per-query deep SQL analysis each query: better or worse? + histograms: ms query group # before after
  • 51. Is it possible with existing tools? Yes! ● Docker ● pgreplay ● pg_stat_*** ● auto_explain ● pgBadger (with JSON output) ● AWS EC2 spot instances All the blocks exist already!
  • 52. Is it possible with existing tools? Yes! ● Docker ● pgreplay ● pg_stat_*** ● auto_explain ● pgBadger (with JSON output) ● AWS EC2 spot instances All the blocks exist already! Nancy CLI: all these blocks (and more!), integrated and automated https://github.com/postgres-ai/nancy
  • 53. DIY automated pipeline for DB experiments and optimization How to automate database optimization using ecosystem tools and AWS? Analyze: ● pg_stat_statements ● auto_explan ● pgBadger to parse logs, use JSON output ● pg_query to group queries better ● pg_stat_kcache to analyze FS-level ops Configuration: ● annotated.conf, pgtune, pgconfigurator, postgresqlco.nf ● ottertune Suggested indexes (internal “what-if” API w/o actual execution) ● (useful: pgHero, POWA, HypoPG, dexter, plantuner) Conduct experiments: ● pgreplay to replay logs (different log_line_prefix, you need to handle it) ● EC2 spot instances Machine learning ● MADlib pgBadger: ● Grouping queries can be implemented better (see pg_query) ● Makes all queries lower cased (hurts "camelCased" names)* ● Doesn’t really support plans (auto_explain)* pgreplay and pgBadger are not friends, require different log formats *) Fixed/improved in pgBadger 10.0
  • 54. Postgres.ai — artificial DBA/DBRE assistants AI-based cloud-friendly platform to automate database administration 54 Steve AI-based expert in capacity planning and database tuning Joe AI-based expert in query optimization and Postgres indexes Nancy AI-based expert in database experiments. Conducts experiments and presents results to human and artificial DBAs Sign up for early access: https://Postgres.ai
  • 55. Postgres.ai — artificial DBA/DBRE assistants AI-based cloud-friendly platform to automate database administration 55 Steve AI-based expert in capacity planning and database tuning Joe AI-based expert in query optimization and Postgres indexes Nancy AI-based expert in database experiments. Conducts experiments and presents results to human and artificial DBAs Sign up for early access: https://Postgres.ai
  • 56. Meet Nancy CLI (open source) Nancy CLI https://github.com/postgres-ai/nancy ● custom docker image (Postgres with extensions & tools) ● nancy prepare-workload to convert Postgres logs (now only .csv) to workload binary file ● nancy run to run experiments ● able to run locally (any machine) on in EC2 spot instance (low price!), including i3.*** instances (with NVMe) ● fully automated management of EC2 spots
  • 57. What’s inside the docker container? Source: https://github.com/postgres-ai/nancy/tree/master/docker Image: https://hub.docker.com/r/postgresmen/postgres-with-stuff/ Inside: ● Ubuntu 16.04 ● Postgres (now 9.6 or10) ● postgres_dba (for manual debugging) https://github.com/NikolayS/postgres_dba ● log_min_duration_statement = 0 ● pg_stat_statements enabled, with track_io_timing = on ● auto_explain enabled (all queries, with timing) ● pgreplay ● pgBadger ● pg_stat_kcache (WIP)
  • 58. Nancy CLI – Use Cases - Deep performance analysis (“clean run”), not affecting production - Change management - Regression testing when upgrading (software, hardware) - Verify config optimization ideas - Find optimal configuration - … and train ML models for Postgres AI!
  • 59. Part 4. Real life challenges* ______________________________ * based on experience in 4 companies with small to mid-sized databases (up to few TBs) and OLTP workloads (up to 15k TPS)
  • 60. How to collect workload? log_min_duration_statement = 0
  • 61. Fears of log_min_duration_statement = 0 Evaluate the impact of log_min_duration_statement = 0 (quickly and w/o any changes!): https://gist.github.com/NikolayS/08d9b7b4845371d03e195a8d8df43408 (pay attention to comments) Only ~300 kB/s expected, ~800 log entries per second
  • 63. log_destination = syslog logging_collector = off or log_destination = stderr logging_collector = off or log_destination = csvlog logging_collector = on What is better for intensive logging?
  • 64. # Postgres 9.6.10 pgbench -U postgres -j2 -c24 -T60 -rnf - <<EOF select; EOF
  • 65. # Postgres 9.6.10 pgbench -U postgres -j2 -c24 -T60 -rnf - <<EOF select; EOF
  • 66. # Postgres 9.6.10 pgbench -U postgres -j2 -c24 -T60 -rnf - <<EOF select; EOF
  • 67. # Postgres 9.6.10 pgbench -U postgres -j2 -c24 -T60 -rnf - <<EOF select; EOF All-queries logging with syslog is ● 44x slower compared to “no logging” ● 33x slower compared to stderr / logging collector Be careful with syslog / journald
  • 68. ● Forecast expected IOPS and write bandwidth ● Analyze your logging system (syslog/journald slows down everything, use STDERR, logging collector) ● If forecasted impact is too high (say, dozens of MBs), consider sampling ( "SET log_min_duration_statement = 0;" in particular, random sessions) Advices for log_min_duration_statement = 0
  • 69. Nancy real-world examples: shared_buffers postila.ru, Real workload (5 min), 61 GB RAM (ec2 i3.2xlarge), DB size: ~500GB Various shared_buffers values shared_buffers = 15GB (25%) If we go from 25% o higher values (~80%), we improve SQL performance ~50%
  • 70. Nancy real-world examples: seq_page_cost GitLab.com: random_page_cost = 1.5 and seq_page_cost = 1 Decision was made to switch to seq_page_cost = 4 DB experiments with Nancy CLI were made to check was this decision good in terms of performance. Results: https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/4835#note_106669373 – in general, SQL performance improved ~40% WIP here, it is an open question, why is it so.
  • 71. Nancy real-world examples: educate yourself PostgreSQL Documentation “19.5. Write Ahead Log” https://www.postgresql.org/docs/current/static/runtime-config-wal.html
  • 72. Nancy real-world examples: educate yourself PostgreSQL Documentation “19.5. Write Ahead Log” https://www.postgresql.org/docs/current/static/runtime-config-wal.html Just conduct DB experiment with Nancy CLI, use --keep-alive 3600 and compare!
  • 73. Thank you! Nikolay Samokhvalov ns@postgres.ai twitter: @postgresmen 73 Give it a try: https://github.com/postgres-ai/nancy
  • 74. Part 5. Future development
  • 75. Various ways to create an experimental database ● plain text pg_dump ○ restoration is very slow (1 vcpu utilized) ○ “logical” – physical structure is lost (cannot experiment with bloat, etc) ○ small (if compressed) ○ “snapshot” only ● pg_dump with either -Fd (“directory”) or -Fc (“custom”): ○ restoration is faster (multiple vCPUs, -j option) ○ “logical” (again: bloat, physical layout is “lost”) ○ small (because compressed) ○ “snapshot” only ● pg_basebackup + WALs, point-in-time recovery (PITR), possibly with help from WAL-E, WAL-G, pgBackRest ○ less reliable, sometimes there issues (especially if 3rd party tools involved - e.g. WAL-E & WAL-G don’t support tablespaces, there are bugs sometimes, etc) ○ “physical”: bloat and physical structure is preserved ○ not small – ~ size of the DB ○ can “walk in time” (PITR) ○ requires warm-up procedure (data is not in the memory!) ● AWS RDS: create a replica + promote it ○ no Spots :-/ ○ Lazy Load is tricky (it looks like the DB is there but it’s very slow – warm-up is needed) ● Snapshots ● Ideas for serialization ○ Stop Postgres / rsync “back” or re-copy locally on NVMe / start Postgres
  • 76. How can we speed up experimental runs? ● Prepare the EC2 instance(s) in advance and keep it ● Prepare EBS volume(s) only (perhaps, using an instance of the different type) and keep it ready. When attached to the new instance, do warm-up ● Resource re-usage: ○ reuse docker container ○ reuse EC2 instance ○ serialize experimental runs serialization (DDL Do/Undo; VACUUM FULL; cleanup) ● Partial database snapshots (dump/restore only needed tables)
  • 77. The future development of Nancy CLI ● Speedup DB creation ● ZFS/XFS snapshots to revert PGDATA state within seconds ● Support GCP ● More artifacts delivered: pg_stat_kcache, etc ● nancy see-report to print the summary + top-30 queries ● nancy compare-reports to print the “diff” for 2+ reports (the summary + numbers for top-30 queries, ordered by by total time based on the 1st report) ● Postgres 11 ● pgbench -i for database initialization ● pgbench to generate multithreaded synthetic workload ● Workload analysis: automatically detect “N+1 SELECT” when running workload ● Better support for the serialization of experimental runs ● Better support for multiple runs https://github.com/postgres-ai/nancy/pull/97 ○ interval with step ○ gradient descent ● Provide costs estimation (time + money) ● Rewrite in Python or Go Feedback/contributions welcome https://github.com/postgres-ai/nancy
  • 78. Challenge: security issues Problem: a developer doesn’t have access to production. Nancy works with production data/workload. What about permissions and data protection?
  • 79. Challenge: security issues Problem: a developer doesn’t have access to production. Nancy works with production data/workload. What about permissions and data protection? Possible solutions: ● Option 1: allow using Nancy CLI only to those who already have access production (DBAs, SREs, DBREs) ● Option 2: obfuscate data when preparing a DB clone (no universal solution yet, TODO) ● Option 3: allow access only to GUI, hide/obfuscate parameters (TODO)
  • 80. Challenge: reliable results Issues: 1. Single runs is not enough (fluctuations) – must repeat 2. “Before”/”after” runs on 2 different machines / EC2 nodes – “not fair” comparison (defective hardware, throttling) Solutions (ideally: combination of them): ● Sequential runs ● 4+ iterations of each experimental run ● “Baseline benchmark” https://github.com/postgres-ai/nancy/issues/94