Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Understanding Autovacuum

Understanding Autovacuum

Herunterladen, um offline zu lesen

Autovacuuming is one of the most common causes of stubbed toes for PostgreSQL newbies. This talk will be a deep dive into what vacuuming and autovacuuming do, why they are necessary, how to tune them, and how to evaluate whether or not your tuning is correct. I'll also discuss some lessons learned from doing this in a pathologically bloat-heavy context.

Autovacuuming is one of the most common causes of stubbed toes for PostgreSQL newbies. This talk will be a deep dive into what vacuuming and autovacuuming do, why they are necessary, how to tune them, and how to evaluate whether or not your tuning is correct. I'll also discuss some lessons learned from doing this in a pathologically bloat-heavy context.

Weitere Verwandte Inhalte

Ähnliche Bücher

Kostenlos mit einer 30-tägigen Testversion von Scribd

Alle anzeigen

Ähnliche Hörbücher

Kostenlos mit einer 30-tägigen Testversion von Scribd

Alle anzeigen

Understanding Autovacuum

  1. 1. Understanding Autovacuum Dan Robinson CTO, Heap
  2. 2. • Joined as Heap's first hire in July, 2013 • Previously a backend engineer at Palantir • Stanford '11 in Math and CS whoami
  3. 3. Overview • What is vacuuming? Why is it necessary? • How vacuuming works under the hood. • How autovacuum orchestrates vacuums, and how to tune it. • Case study from a pathological context. • More practical tuning / diagnostics.
  4. 4. What's an MVCC? • Need some way to keep transactions consistent and isolated under high concurrency. • Locks are not a good way to do this. • Enter MVCC.
  5. 5. What's on disk: !""""""#""""""#"""""$ % xmin % xmax % val % &""""""'""""""'"""""( % 1 % 0 % 10 % % % % % % 2 % 0 % 20 % % % % % % 2 % 0 % 30 % )""""""*""""""*"""""+ SELECT * FROM table: val ----- 10 20 30 (3 rows)
  6. 6. What's on disk: !""""""#""""""#"""""$ % xmin % xmax % val % &""""""'""""""'"""""( % 1 % 3 % 10 % % % % % % 2 % 0 % 20 % % % % % % 2 % 0 % 30 % )""""""*""""""*"""""+ SELECT * FROM table: val ----- 20 30 (2 rows)
  7. 7. What's on disk: !""""""#""""""#"""""$ % xmin % xmax % val % &""""""'""""""'"""""( % 1 % 3 % 10 % % % % % % 2 % 4 % 20 % % % % % % 2 % 4 % 30 % % % % % % 4 % 0 % 21 % % % % % % 4 % 0 % 31 % )""""""*""""""*"""""+ SELECT * FROM table: val ----- 21 31 (2 rows)
  8. 8. What's on disk: !""""""#""""""#"""""$ % xmin % xmax % val % &""""""'""""""'"""""( % 1 % 3 % 10 % % % % % % 2 % 4 % 20 % % % % % % 2 % 4 % 30 % % % % % % 4 % 0 % 21 % % % % % % 4 % 0 % 31 % )""""""*""""""*"""""+ These numbers are 32-bit!
  9. 9. • Need to clean up dead versions of rows. • Need to prevent "xid wraparound". • Want to do other performance-improving maintenance. VACUUM Problem Statement
  10. 10. !""""""#""""""#"""""$ % xmin % xmax % val % &""""""'""""""'"""""( % 1 % 3 % 10 % % % % % % 2 % 4 % 20 % % % % % % 2 % 4 % 30 % % % % % % 4 % 0 % 21 % % % % % % 4 % 0 % 31 % )""""""*""""""*"""""+ <-- XID 3 is committed XID 3 less than all active XIDs ... ...
  11. 11. !""""""#""""""#"""""$ % xmin % xmax % val % &""""""'""""""'"""""( % 1 % 3 % 10 % % % % % % 2 % 4 % 20 % % % % % % 2 % 4 % 30 % % % % % % 4 % 0 % 21 % % % % % % 4 % 0 % 31 % )""""""*""""""*"""""+ <-- <--------- Free Space Map / <--
  12. 12. !""""""#""""""#"""""$ % xmin % xmax % val % &""""""'""""""'"""""( % 9 % 0 % 40 % % % % % % 2 % 4 % 20 % % % % % % 2 % 4 % 30 % % % % % % 4 % 0 % 21 % % % % % % 4 % 0 % 31 % )""""""*""""""*"""""+ INSERT INTO foo VALUES (40) <--------- Free Space Map / <--
  13. 13. • Go through the table adding dead rows to the FSM. • Also need to remove dead rows from indexes. • Single long-running transaction wrecks everything.
  14. 14. • Go through the table adding dead rows to the FSM. • Also need to remove dead rows from indexes. • Single long-running transaction wrecks everything. • Still not deleting anything! ‣ "Halp I'm running VACUUM and my table isn't shrinking!"
  15. 15. VACUUM FULL • Rewrites the table with no dead space. • Serious drawbacks: ‣ Locks the table, no reads or writes allowed! ‣ Should basically never be using this.
  16. 16. VACUUM FULL • Rewrites the table with no dead space. • Serious drawbacks: ‣ Locks the table, no reads or writes allowed! ‣ Should basically never be using this. • Can also use pg_repack in a pinch.
  17. 17. • Need to clean up dead versions of rows. ➡ Need to prevent "xid wraparound". • Want to do other performance-improving maintenance. VACUUM Problem Statement
  18. 18. The XID Ring XID 0 XID 1 XID 2 ... ... XID 2^32-1
  19. 19. XID 2 The Future The Past XID 0
  20. 20. XID 2 The Future The Past XID 0 Problem!
  21. 21. !""""""#""""""#"""""$ % xmin % xmax % val % &""""""'""""""'"""""( % 1 % 3 % 10 % % % % % % 2 % 4 % 20 % % % % % % 2 % 4 % 30 % % % % % % 4 % 0 % 21 % % % % % % 4 % 0 % 31 % )""""""*""""""*"""""+ <-- XID 3 is committed XID 3 less than all active txns ... ...
  22. 22. !""""""#""""""#"""""$ % xmin % xmax % val % &""""""'""""""'"""""( % 1 % 3 % 10 % % % % % % 2 % 4 % 20 % % % % % % 2 % 4 % 30 % % % % % % 4 % 0 % 21 % % % % % % 4 % 0 % 31 % )""""""*""""""*"""""+ <-- <--------- Free Space Map / <--
  23. 23. !""""""#""""""#"""""$ % xmin % xmax % val % &""""""'""""""'"""""( % 1 % 3 % 10 % % % % % % 2 % 4 % 20 % % % % % % 2 % 4 % 30 % % % % % % 4 % 0 % 21 % % % % % % 4 % 0 % 31 % )""""""*""""""*"""""+ <-- <--------- Free Space Map / <-- <-- XID 4 was committed a really long time ago
  24. 24. !""""""#""""""#"""""$ % xmin % xmax % val % &""""""'""""""'"""""( % 1 % 3 % 10 % % % % % % 2 % 4 % 20 % % % % % % 2 % 4 % 30 % % % % % % FXID % 0 % 21 % % % % % % FXID % 0 % 31 % )""""""*""""""*"""""+ <-- <--------- Free Space Map / <-- <-- Frozen <-- Frozen
  25. 25. • VACUUM goes through the table freezing old rows. • In contrast to cleaning up dead rows, this is mandatory! ‣ "Halp I set autovacuum = off but it's still running!"
  26. 26. • Need to clean up dead versions of rows. • Need to prevent "xid wraparound". ➡ Want to do other performance-improving maintenance VACUUM Problem Statement
  27. 27. Perf-Improving Maintenance • Update table statistics. • Update visibility map.
  28. 28. • Need to clean up dead versions of rows. • Need to prevent "xid wraparound". • Want to do other performance-improving maintenance VACUUM Problem Statement
  29. 29. Tuning VACUUM • Controlling I/O impact of vacuums. • Tuning memory use of vacuums. • Trade-offs around row-freezing. • Configuring autovacuum's orchestration of these processes.
  30. 30. Controlling I/O Impact vacuum_cost_limit and vacuum_cost_delay while(table_not_done) do vacuum_cost_limit "points" of i/o (globally!) sleep for vacuum_cost_delay ms
  31. 31. Controlling I/O Impact vacuum_cost_limit and vacuum_cost_delay while(table_not_done) do vacuum_cost_limit "points" of i/o (globally!) sleep for vacuum_cost_delay ms vacuum_cost_limit --> autovacuum_vacuum_cost_limit vacuum_cost_delay --> autovacuum_vacuum_cost_delay
  32. 32. Tuning Memory Use maintenance_work_mem: autovacuum_work_mem: How much memory can a vacuum, index creation, or other maintenance op use? Same as above, but specific to autovacuum processes. (Defaults to maintenance_work_mem.)
  33. 33. What's an I/O "point"? vacuum_cost_page_hit: vacuum_cost_page_miss: vacuum_cost_page_dirty: cost of vacuuming a buffer that was in the buffer cache. (Default 1) cost of vacuuming a buffer that we had to read from the filesystem. (Default 10) cost of modifying a buffer that was in the buffer cache. (Default 20)
  34. 34. Row-Freezing Trade-Offs vacuum_freeze_min_age: How old does an xid have to be before we freeze it? (Default 200M)
  35. 35. Row-Freezing Trade-Offs vacuum_freeze_min_age: vacuum_freeze_table_age: How old does an xid have to be before we freeze it? (Default 200M) How often should we do an "aggressive" (more thorough) vacuum? (Default 150M)
  36. 36. Row-Freezing Trade-Offs vacuum_freeze_min_age: vacuum_freeze_table_age: autovacuum_freeze_max_age: How old does an xid have to be before we freeze it? (Default 200M) How often should we do an "aggressive" (more thorough) vacuum? (Default 150M) How old can a table's oldest xid be before we trigger an aggressive vacuum on it? (Default 200M)
  37. 37. Row-Freezing TL;DR vacuum_freeze_min_age: vacuum_freeze_table_age: autovacuum_freeze_max_age: Default is reasonable, should lower for static / append-only use cases. Probably want 80-90% of autovacuum_freeze_max_age. Probably want ~1 billion or higher, assuming 250 mb is not a big deal.
  38. 38. What tables should be autovacuumed? autovacuum_vacuum_scale_factor: autovacuum_vacuum_threshold: What percentage of dead rows is allowed before a table should be autovacuumed? How many dead rows need to be in the table before we vacuum it?
  39. 39. What tables should be autovacuumed? autovacuum_vacuum_scale_factor: autovacuum_vacuum_threshold: What percentage of dead rows is allowed before a table should be autovacuumed? How many dead rows need to be in the table before we vacuum it? (scale_factor * num_rows) + threshold < num_dead_rows
  40. 40. Other Tunables autovacuum_max_workers: autovacuum_naptime: How many autovacuums / autoanalyzes are allowed to be run at a time? How long should we wait between starting new autovacuum jobs?
  41. 41. Autovacuum Pseudocode while(true) if (autovacuum_max_workers already running) wait_until_one_finishes() if (exists a table with XIDs older than autovacuum_freeze_max_age) spawn a process to VACUUM that table in aggressive mode else if (exists a table with dead_rows > (vacuum_scale_factor * num_rows) + vacuum_threshold) spawn a process to VACUUM that table sleep for autovacuum_naptime
  42. 42. "When in doubt, VACUUM more, not less."
  43. 43. Case Study: Heap 2014
  44. 44. CREATE TABLE user_events ( customer_id BIGINT, user_id BIGINT, properties JSONB NOT NULL, events JSONB[] NOT NULL ); }PRIMARY KEY Problems Start With The Schema... ... x 10,000 shards of this per DB
  45. 45. Explosive Bloat This is a 3.2 TB machine!
  46. 46. About 500 GB of bloat!
  47. 47. About 500 GB of bloat! VACUUM FULL Friday night
  48. 48. AUTOVACUUM STRATEGY I AUTOVACUUM STRATEGY II
  49. 49. • Are we sure this is bloat, vs lots of data coming in? ‣ Yes, because VACUUM FULL brings the space back down. • We have spare I/O. Can we make vacuums more aggressive? ‣ Increase autovacuum_vacuum_cost_limit from 200 to 10000. ‣ Decrease autovacuum_vacuum_cost_delay from 20ms to 5ms.
  50. 50. • Does autovacuum think tables should be vacuumed? ‣ Yes, so bad table stats aren't the issue. ‣ Many tables are 10x bloated or more, so decreasing autovacuum_vacuum_scale_factor is also not the solution.
  51. 51. • Are vacuums happening? ‣ No, or only briefly. So we aren't resource-limited. ‣ What if we try allowing more vacuums at a time? I.e., increase autovacuum_max_workers from 3 to 25.
  52. 52. • Still not seeing more vacuums. Judging by logs we aren't doing more of them! ‣ Issue is that we'll start at most one every autovacuum_naptime, so the default of 60s is way too low for our context. ‣ Decrease autovacuum_naptime from 60s to 10s to 1s.
  53. 53. Questions? Or, ask me on twitter: @danlovesproofs Diagnostics here: https://github.com/heap/pg-autovacuum-diagnostics

×