SlideShare ist ein Scribd-Unternehmen logo
1 von 15
Downloaden Sie, um offline zu lesen
Click to edit Master subtitle style
1
Query Planning Gone Wrong
Robert Haas
Chief Architect, Database Server
Why This Talk?
● 2010: The PostgreSQL Query Planner (Robert Haas)
● How does the query planner actually work from a user
perspective? What does it really do?
● Very common audience question: What do I do when the query
planner fails? How do I fix my query?
● 2011: Hacking the Query Planner (Tom Lane)
● How does the query planner actually work from a developer
perspective? What does it *really* do?
● Plea for help to improve the query planner.
● But... what should we be improving?
Methodology: Which Problems Matter?
● Read hundreds of email threads on pgsql-performance over a
period of almost two years.
● Disregarded all those that were not about query performance
problems.
● Decided what I thought the root cause (or, occasionally, causes) of
each complaint was.
● Skipped a very small number where I couldn't form an opinion.
● Counted the number of times each problem was reported.
Methodology: Possible Critiques
● The problems reported on pgsql-performance aren't necessarily
representative of all the problems PostgreSQL users encounter
(reporting bias).
● In particular, confusing problems might be more likely to be
reported.
● I might not have correctly identified the cause of each problem
(researcher bias).
● Others?
Statistically Speaking, Why Is My Query Slow? (168)
● Settings (23). Includes anything you can fix with postgresql.conf
changes, DDL, or operating systems settings changes.
● Just Plain Slow (23). Includes anything that amounts to an
unreasonable expectation on the part of the user. These are often
questions of the form “why is query A slower than query B?” when
A is actually doing something much more expensive than B.
● We're Bad At That (22). Includes anything that could be faster in
some other database product, but isn't fast in PostgreSQL for
some reason (not implemented yet, or architectural artifact).
● Planner Error (83). Bad decisions about the cost of one plan vs.
another plan due to limitations of the optimizer.
● Bugs (14). Bugs in the query planner, or in one case, the Linux
kernel.
● User Error (3). User got confused and did something illogical.
Settings (23)
● Planner Cost Constants (8). Adjustments needed to
seq_page_cost, random_page_cost, and perhaps cpu_tuple_cost
to accurately model real costs.
● Missing Index (4)
● Cost for @@ Operator Is Too Low (2)
● work_mem Too Low (2)
● Statistics Target Too Low (2)
● Statistics Target Too High (1)
● n_distinct Estimates Aren't Accurate On Large Tables (1)
● Not Analyzing Tables Often Enough (1)
● TOAST Decompression is Slow (1)
● vm.zone_reclaim_mode = 1 Causes Extra Disk I/O (1)
Just Plain Slow (23)
● It Takes a While to Process a Lot of Data (6)
● Disks Are Slower Than Memory (6)
● Clauses Involving Multiple Tables Can't Be Pushed Down (2)
● Random I/O is Slower Than Sequential I/O (1)
● Linearly Scanning an Array is O(n) (1)
● One Regular Expression is Faster Than Two (1)
● Can't Figure Out Which Patterns Match a String Without Trying
Them All (1)
● xmlagg Is Much Slower Than string_agg (1)
● Scanning More Tables is Slower Than Scanning Fewer Tables (1)
● Replanning Isn't Free (1)
● Repeated Concatenation Using xmlconcat Is Slow (1)
● UNION is Slower than UNION ALL (1)
We're Bad At That (22)
● Plan Types We Can't Generate (11)
● Parameterized Paths (7). Two of these are post-9.2 complaints,
involving cases where 9.2 can't parameterize as needed.
● Merge Append (3). Fixed in 9.1.
● Batched Sort of Data Already Ordered By Leading Columns (1).
● Executor Limitations (3)
● Indexing Unordered Data Causes Random I/O (1)
● <> is Not Indexable (1)
● DISTINCT + HashAggregate Reads All Input Before Emitting
Any Results (1). This matters if there is a LIMIT.
● Architecture (8)
● No Parallel Query (2), Table Bloat (1), Backend Startup Cost (1),
Redundant Updates Are Expensive (1), AFTER Trigger Queue
Size (1), On-Disk Size of numeric (1), Autovacuum Not Smart
About Inherited Tables (1)
Planner Errors (83)
● Any guesses?
Planner Errors (83)
● Conceptual Errors (28). The planner isn't able to recognize that
two different queries are equivalent, so it doesn't even consider the
best plan.
● Estimation Errors (55). The planner considers the optimal plan, but
rejects it as too expensive.
● Row Count Estimation Errors (48). The planner mis-estimates
the number of rows that will be returned by some scan, join, or
aggregate.
● Cost Estimation Errors (7). The planner estimates the row
count correctly but incorrectly estimates the relative cost.
Grand Prize Winners
● Selectivity of filter conditions involving correlated columns is
estimated inaccurately (13)
● Suppose we want all the rows from a table where a = 1 and b =
1 and c = 1 and d = 1 and e = 1. The planner must estimate the
number of rows that will match, but only has statistics on each
column individually.
● Planner incorrectly thinks that “SELECT * FROM foo WHERE a = 1
ORDER BY b LIMIT n” will fill the limit after reading a small
percentage of the index (11)
● It can scan an index on b and filter for rows where a = 1.
● Or it can scan an index on a, find all rows where a = 1, and
perform a top-N sort.
● It often prefers the former when the latter would be faster.
● Can often be worked around with a composite or functional
index.
Planner Error: Row Count Estimation – Others (24)
● Using WITH Results in a Bad Plan (5). Some of these are query
fattening issues, while others result from failure to dig out variable
statistics.
● Generic Plans Can Have Wildly Wrong Estimates (4). Improved.
● Selectivity Estimates on Arbitrary Estimates are Poor (4)
● Join Selectivity Doesn't Know about Cross-Table Correlations (3)
● Uncommitted Tuples Don't Affect Statistics (2)
● No Stats for WITH RECURSIVE (1) or GROUP BY (1) Results
● Redundant Equality Constraints Not Identified As Such (1)
● IN/NOT IN Estimation Doesn't Assume Array Elements Distinct (1).
Fixed.
● Histogram Bounds Can Slide Due to New Data (1). Fixed.
● Inheritance Parents Aren't Assumed to be Completely Empty (1).
Fixed.
Planner Error: Cost Estimation (7)
● Planner doesn't account for de-TOASTing cost (4)
● Plan change causes volume of data to exceed server memory (2)
● Hash join sometimes decides to hash the larger table when it
should probably be hashing the smaller one (1)
Planner Error: Conceptual (28)
● Cross-data type comparisons are not always indexable (3)
● Inlining the same thing multiple times can lose (3)
● NOT IN is hard to optimize – and we don't try very hard (3)
● Target lists are computed too early or unnecessary targets are
computed (3)
● Can't rewrite SELECT max(a) FROM foo WHERE b IN (…) as max of
index scans (2)
● Can't rearrange joins and aggregates relative to one another (2)
● Can't deduce implied inequalities (2)
● Ten other issues that came up once each
Thank You
● Any questions?

Weitere ähnliche Inhalte

Was ist angesagt?

facility layout paper
 facility layout paper facility layout paper
facility layout paper
Saurabh Tiwary
 

Was ist angesagt? (20)

Query processing and Query Optimization
Query processing and Query OptimizationQuery processing and Query Optimization
Query processing and Query Optimization
 
Abraham march07
Abraham march07Abraham march07
Abraham march07
 
Flowchart design for algorithms
Flowchart design for algorithmsFlowchart design for algorithms
Flowchart design for algorithms
 
ADS Introduction
ADS IntroductionADS Introduction
ADS Introduction
 
COMPUTER PROGRAMMING UNIT 1 Lecture 4
COMPUTER PROGRAMMING UNIT 1 Lecture 4COMPUTER PROGRAMMING UNIT 1 Lecture 4
COMPUTER PROGRAMMING UNIT 1 Lecture 4
 
Randomized Algorithms
Randomized AlgorithmsRandomized Algorithms
Randomized Algorithms
 
Plant Layout Algorithm
Plant Layout AlgorithmPlant Layout Algorithm
Plant Layout Algorithm
 
Topic 1.4: Randomized Algorithms
Topic 1.4: Randomized AlgorithmsTopic 1.4: Randomized Algorithms
Topic 1.4: Randomized Algorithms
 
Synthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-makingSynthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-making
 
Query optimization
Query optimizationQuery optimization
Query optimization
 
Discrete event simulation
Discrete event simulationDiscrete event simulation
Discrete event simulation
 
facility layout paper
 facility layout paper facility layout paper
facility layout paper
 
CIS110 Computer Programming Design Chapter (4)
CIS110 Computer Programming Design Chapter  (4)CIS110 Computer Programming Design Chapter  (4)
CIS110 Computer Programming Design Chapter (4)
 
Algorithm
AlgorithmAlgorithm
Algorithm
 
connecting discrete mathematics and software engineering
connecting discrete mathematics and software engineeringconnecting discrete mathematics and software engineering
connecting discrete mathematics and software engineering
 
Flow Chart @ppsc(2)
Flow Chart @ppsc(2)Flow Chart @ppsc(2)
Flow Chart @ppsc(2)
 
Algorithm
AlgorithmAlgorithm
Algorithm
 
Flow control in computer
Flow control in computerFlow control in computer
Flow control in computer
 
State chart diagram
State chart diagramState chart diagram
State chart diagram
 
SERENE 2014 Workshop: Paper "Modelling Resilience of Data Processing Capabili...
SERENE 2014 Workshop: Paper "Modelling Resilience of Data Processing Capabili...SERENE 2014 Workshop: Paper "Modelling Resilience of Data Processing Capabili...
SERENE 2014 Workshop: Paper "Modelling Resilience of Data Processing Capabili...
 

Andere mochten auch

David Keeney - SQL Database Server Requests from the Browser @ Postgres Open
David Keeney - SQL Database Server Requests from the Browser @ Postgres OpenDavid Keeney - SQL Database Server Requests from the Browser @ Postgres Open
David Keeney - SQL Database Server Requests from the Browser @ Postgres Open
PostgresOpen
 
Ryan Jarvinen Open Shift Talk @ Postgres Open 2013
Ryan Jarvinen Open Shift Talk @ Postgres Open 2013Ryan Jarvinen Open Shift Talk @ Postgres Open 2013
Ryan Jarvinen Open Shift Talk @ Postgres Open 2013
PostgresOpen
 
Kevin Kempter - PostgreSQL Backup and Recovery Methods @ Postgres Open
Kevin Kempter - PostgreSQL Backup and Recovery Methods @ Postgres OpenKevin Kempter - PostgreSQL Backup and Recovery Methods @ Postgres Open
Kevin Kempter - PostgreSQL Backup and Recovery Methods @ Postgres Open
PostgresOpen
 
Bruce Momjian - Inside PostgreSQL Shared Memory @ Postgres Open
Bruce Momjian - Inside PostgreSQL Shared Memory @ Postgres OpenBruce Momjian - Inside PostgreSQL Shared Memory @ Postgres Open
Bruce Momjian - Inside PostgreSQL Shared Memory @ Postgres Open
PostgresOpen
 
Gurjeet Singh - How Postgres is Different From (Better Tha) Your RDBMS @ Post...
Gurjeet Singh - How Postgres is Different From (Better Tha) Your RDBMS @ Post...Gurjeet Singh - How Postgres is Different From (Better Tha) Your RDBMS @ Post...
Gurjeet Singh - How Postgres is Different From (Better Tha) Your RDBMS @ Post...
PostgresOpen
 
Keith Fiske - When PostgreSQL Can't, You Can @ Postgres Open
Keith Fiske - When PostgreSQL Can't, You Can @ Postgres OpenKeith Fiske - When PostgreSQL Can't, You Can @ Postgres Open
Keith Fiske - When PostgreSQL Can't, You Can @ Postgres Open
PostgresOpen
 
Henrietta Dombrovskaya - A New Approach to Resolve Object-Relational Impedanc...
Henrietta Dombrovskaya - A New Approach to Resolve Object-Relational Impedanc...Henrietta Dombrovskaya - A New Approach to Resolve Object-Relational Impedanc...
Henrietta Dombrovskaya - A New Approach to Resolve Object-Relational Impedanc...
PostgresOpen
 
Keith Paskett - Postgres on ZFS @ Postgres Open
Keith Paskett - Postgres on ZFS @ Postgres OpenKeith Paskett - Postgres on ZFS @ Postgres Open
Keith Paskett - Postgres on ZFS @ Postgres Open
PostgresOpen
 
Selena Deckelmann - Sane Schema Management with Alembic and SQLAlchemy @ Pos...
Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Pos...Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Pos...
Selena Deckelmann - Sane Schema Management with Alembic and SQLAlchemy @ Pos...
PostgresOpen
 
Steve Singer - Managing PostgreSQL with Puppet @ Postgres Open
Steve Singer - Managing PostgreSQL with Puppet @ Postgres OpenSteve Singer - Managing PostgreSQL with Puppet @ Postgres Open
Steve Singer - Managing PostgreSQL with Puppet @ Postgres Open
PostgresOpen
 
Michael Bayer Introduction to SQLAlchemy @ Postgres Open
Michael Bayer Introduction to SQLAlchemy @ Postgres OpenMichael Bayer Introduction to SQLAlchemy @ Postgres Open
Michael Bayer Introduction to SQLAlchemy @ Postgres Open
PostgresOpen
 
PoPostgreSQL Web Projects: From Start to FinishStart To Finish
PoPostgreSQL Web Projects: From Start to FinishStart To FinishPoPostgreSQL Web Projects: From Start to FinishStart To Finish
PoPostgreSQL Web Projects: From Start to FinishStart To Finish
elliando dias
 
Koichi Suzuki - Postgres-XC Dynamic Cluster Management @ Postgres Open
Koichi Suzuki - Postgres-XC Dynamic Cluster  Management @ Postgres OpenKoichi Suzuki - Postgres-XC Dynamic Cluster  Management @ Postgres Open
Koichi Suzuki - Postgres-XC Dynamic Cluster Management @ Postgres Open
PostgresOpen
 

Andere mochten auch (20)

World Robot Olympiad 2017
World Robot Olympiad 2017World Robot Olympiad 2017
World Robot Olympiad 2017
 
World Robot Olympiad india 2016 Rap the Scrap! - How to Particapte
World Robot Olympiad india 2016   Rap the Scrap! - How to ParticapteWorld Robot Olympiad india 2016   Rap the Scrap! - How to Particapte
World Robot Olympiad india 2016 Rap the Scrap! - How to Particapte
 
David Keeney - SQL Database Server Requests from the Browser @ Postgres Open
David Keeney - SQL Database Server Requests from the Browser @ Postgres OpenDavid Keeney - SQL Database Server Requests from the Browser @ Postgres Open
David Keeney - SQL Database Server Requests from the Browser @ Postgres Open
 
Ryan Jarvinen Open Shift Talk @ Postgres Open 2013
Ryan Jarvinen Open Shift Talk @ Postgres Open 2013Ryan Jarvinen Open Shift Talk @ Postgres Open 2013
Ryan Jarvinen Open Shift Talk @ Postgres Open 2013
 
Kevin Kempter - PostgreSQL Backup and Recovery Methods @ Postgres Open
Kevin Kempter - PostgreSQL Backup and Recovery Methods @ Postgres OpenKevin Kempter - PostgreSQL Backup and Recovery Methods @ Postgres Open
Kevin Kempter - PostgreSQL Backup and Recovery Methods @ Postgres Open
 
Bruce Momjian - Inside PostgreSQL Shared Memory @ Postgres Open
Bruce Momjian - Inside PostgreSQL Shared Memory @ Postgres OpenBruce Momjian - Inside PostgreSQL Shared Memory @ Postgres Open
Bruce Momjian - Inside PostgreSQL Shared Memory @ Postgres Open
 
Gurjeet Singh - How Postgres is Different From (Better Tha) Your RDBMS @ Post...
Gurjeet Singh - How Postgres is Different From (Better Tha) Your RDBMS @ Post...Gurjeet Singh - How Postgres is Different From (Better Tha) Your RDBMS @ Post...
Gurjeet Singh - How Postgres is Different From (Better Tha) Your RDBMS @ Post...
 
Keith Fiske - When PostgreSQL Can't, You Can @ Postgres Open
Keith Fiske - When PostgreSQL Can't, You Can @ Postgres OpenKeith Fiske - When PostgreSQL Can't, You Can @ Postgres Open
Keith Fiske - When PostgreSQL Can't, You Can @ Postgres Open
 
Henrietta Dombrovskaya - A New Approach to Resolve Object-Relational Impedanc...
Henrietta Dombrovskaya - A New Approach to Resolve Object-Relational Impedanc...Henrietta Dombrovskaya - A New Approach to Resolve Object-Relational Impedanc...
Henrietta Dombrovskaya - A New Approach to Resolve Object-Relational Impedanc...
 
Keith Paskett - Postgres on ZFS @ Postgres Open
Keith Paskett - Postgres on ZFS @ Postgres OpenKeith Paskett - Postgres on ZFS @ Postgres Open
Keith Paskett - Postgres on ZFS @ Postgres Open
 
Selena Deckelmann - Sane Schema Management with Alembic and SQLAlchemy @ Pos...
Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Pos...Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Pos...
Selena Deckelmann - Sane Schema Management with Alembic and SQLAlchemy @ Pos...
 
Islamabad PUG - 7th Meetup - performance tuning
Islamabad PUG - 7th Meetup - performance tuningIslamabad PUG - 7th Meetup - performance tuning
Islamabad PUG - 7th Meetup - performance tuning
 
Islamabad PUG - 7th meetup - performance tuning
Islamabad PUG - 7th meetup - performance tuningIslamabad PUG - 7th meetup - performance tuning
Islamabad PUG - 7th meetup - performance tuning
 
Out of the box replication in postgres 9.4(pg confus)
Out of the box replication in postgres 9.4(pg confus)Out of the box replication in postgres 9.4(pg confus)
Out of the box replication in postgres 9.4(pg confus)
 
Steve Singer - Managing PostgreSQL with Puppet @ Postgres Open
Steve Singer - Managing PostgreSQL with Puppet @ Postgres OpenSteve Singer - Managing PostgreSQL with Puppet @ Postgres Open
Steve Singer - Managing PostgreSQL with Puppet @ Postgres Open
 
Michael Bayer Introduction to SQLAlchemy @ Postgres Open
Michael Bayer Introduction to SQLAlchemy @ Postgres OpenMichael Bayer Introduction to SQLAlchemy @ Postgres Open
Michael Bayer Introduction to SQLAlchemy @ Postgres Open
 
PoPostgreSQL Web Projects: From Start to FinishStart To Finish
PoPostgreSQL Web Projects: From Start to FinishStart To FinishPoPostgreSQL Web Projects: From Start to FinishStart To Finish
PoPostgreSQL Web Projects: From Start to FinishStart To Finish
 
Koichi Suzuki - Postgres-XC Dynamic Cluster Management @ Postgres Open
Koichi Suzuki - Postgres-XC Dynamic Cluster  Management @ Postgres OpenKoichi Suzuki - Postgres-XC Dynamic Cluster  Management @ Postgres Open
Koichi Suzuki - Postgres-XC Dynamic Cluster Management @ Postgres Open
 
Gbroccolo pgconfeu2016 pgnfs
Gbroccolo pgconfeu2016 pgnfsGbroccolo pgconfeu2016 pgnfs
Gbroccolo pgconfeu2016 pgnfs
 
PostgreSQL HA
PostgreSQL   HAPostgreSQL   HA
PostgreSQL HA
 

Ähnlich wie Robert Haas Query Planning Gone Wrong Presentation @ Postgres Open

Talk PGConf Eu 2013
Talk PGConf Eu 2013Talk PGConf Eu 2013
Talk PGConf Eu 2013
Atri Sharma
 
Talk pg conf eu 2013
Talk pg conf eu 2013Talk pg conf eu 2013
Talk pg conf eu 2013
Atri Sharma
 
Pregel: A System For Large Scale Graph Processing
Pregel: A System For Large Scale Graph ProcessingPregel: A System For Large Scale Graph Processing
Pregel: A System For Large Scale Graph Processing
Riyad Parvez
 

Ähnlich wie Robert Haas Query Planning Gone Wrong Presentation @ Postgres Open (20)

Talk PGConf Eu 2013
Talk PGConf Eu 2013Talk PGConf Eu 2013
Talk PGConf Eu 2013
 
Talk pg conf eu 2013
Talk pg conf eu 2013Talk pg conf eu 2013
Talk pg conf eu 2013
 
BlaBlaCar Elastic Search Feedback
BlaBlaCar Elastic Search FeedbackBlaBlaCar Elastic Search Feedback
BlaBlaCar Elastic Search Feedback
 
Big Data processing with Apache Spark
Big Data processing with Apache SparkBig Data processing with Apache Spark
Big Data processing with Apache Spark
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Druid
DruidDruid
Druid
 
FlumeJava: Easy, Efficient Data-Parallel Pipelines
FlumeJava: Easy, Efficient Data-Parallel PipelinesFlumeJava: Easy, Efficient Data-Parallel Pipelines
FlumeJava: Easy, Efficient Data-Parallel Pipelines
 
Machine Learning Applications in Subsurface Analysis: Case Study in North Sea
Machine Learning Applications in Subsurface Analysis: Case Study in North SeaMachine Learning Applications in Subsurface Analysis: Case Study in North Sea
Machine Learning Applications in Subsurface Analysis: Case Study in North Sea
 
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
 
SFO15-301: Benchmarking Best Practices 101
SFO15-301: Benchmarking Best Practices 101SFO15-301: Benchmarking Best Practices 101
SFO15-301: Benchmarking Best Practices 101
 
MySQL Query Optimisation 101
MySQL Query Optimisation 101MySQL Query Optimisation 101
MySQL Query Optimisation 101
 
Pregel: A System For Large Scale Graph Processing
Pregel: A System For Large Scale Graph ProcessingPregel: A System For Large Scale Graph Processing
Pregel: A System For Large Scale Graph Processing
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
Hadoop & Spark Performance tuning using Dr. Elephant
Hadoop & Spark Performance tuning using Dr. ElephantHadoop & Spark Performance tuning using Dr. Elephant
Hadoop & Spark Performance tuning using Dr. Elephant
 
Join Algorithms in MapReduce
Join Algorithms in MapReduceJoin Algorithms in MapReduce
Join Algorithms in MapReduce
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @Lendingkart
 
Advanced memory allocation
Advanced memory allocationAdvanced memory allocation
Advanced memory allocation
 
Map reduce
Map reduceMap reduce
Map reduce
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
 

Mehr von PostgresOpen

Craig Kerstiens - Scalable Uniques in Postgres @ Postgres Open
Craig Kerstiens - Scalable Uniques in Postgres @ Postgres OpenCraig Kerstiens - Scalable Uniques in Postgres @ Postgres Open
Craig Kerstiens - Scalable Uniques in Postgres @ Postgres Open
PostgresOpen
 
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenJohn Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
PostgresOpen
 
Robert Bernier - Recovering From A Damaged PostgreSQL Cluster @ Postgres Open
Robert Bernier - Recovering From A Damaged PostgreSQL Cluster @ Postgres OpenRobert Bernier - Recovering From A Damaged PostgreSQL Cluster @ Postgres Open
Robert Bernier - Recovering From A Damaged PostgreSQL Cluster @ Postgres Open
PostgresOpen
 
Michael Paquier - Taking advantage of custom bgworkers @ Postgres Open
Michael Paquier - Taking advantage of custom bgworkers @ Postgres OpenMichael Paquier - Taking advantage of custom bgworkers @ Postgres Open
Michael Paquier - Taking advantage of custom bgworkers @ Postgres Open
PostgresOpen
 
Kevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres Open
Kevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres OpenKevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres Open
Kevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres Open
PostgresOpen
 
Andrew Dunstan 9.3 JSON Presentation @ Postgres Open 2013
Andrew Dunstan 9.3 JSON Presentation @ Postgres Open 2013Andrew Dunstan 9.3 JSON Presentation @ Postgres Open 2013
Andrew Dunstan 9.3 JSON Presentation @ Postgres Open 2013
PostgresOpen
 

Mehr von PostgresOpen (6)

Craig Kerstiens - Scalable Uniques in Postgres @ Postgres Open
Craig Kerstiens - Scalable Uniques in Postgres @ Postgres OpenCraig Kerstiens - Scalable Uniques in Postgres @ Postgres Open
Craig Kerstiens - Scalable Uniques in Postgres @ Postgres Open
 
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenJohn Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
 
Robert Bernier - Recovering From A Damaged PostgreSQL Cluster @ Postgres Open
Robert Bernier - Recovering From A Damaged PostgreSQL Cluster @ Postgres OpenRobert Bernier - Recovering From A Damaged PostgreSQL Cluster @ Postgres Open
Robert Bernier - Recovering From A Damaged PostgreSQL Cluster @ Postgres Open
 
Michael Paquier - Taking advantage of custom bgworkers @ Postgres Open
Michael Paquier - Taking advantage of custom bgworkers @ Postgres OpenMichael Paquier - Taking advantage of custom bgworkers @ Postgres Open
Michael Paquier - Taking advantage of custom bgworkers @ Postgres Open
 
Kevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres Open
Kevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres OpenKevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres Open
Kevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres Open
 
Andrew Dunstan 9.3 JSON Presentation @ Postgres Open 2013
Andrew Dunstan 9.3 JSON Presentation @ Postgres Open 2013Andrew Dunstan 9.3 JSON Presentation @ Postgres Open 2013
Andrew Dunstan 9.3 JSON Presentation @ Postgres Open 2013
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Kürzlich hochgeladen (20)

Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Robert Haas Query Planning Gone Wrong Presentation @ Postgres Open

  • 1. Click to edit Master subtitle style 1 Query Planning Gone Wrong Robert Haas Chief Architect, Database Server
  • 2. Why This Talk? ● 2010: The PostgreSQL Query Planner (Robert Haas) ● How does the query planner actually work from a user perspective? What does it really do? ● Very common audience question: What do I do when the query planner fails? How do I fix my query? ● 2011: Hacking the Query Planner (Tom Lane) ● How does the query planner actually work from a developer perspective? What does it *really* do? ● Plea for help to improve the query planner. ● But... what should we be improving?
  • 3. Methodology: Which Problems Matter? ● Read hundreds of email threads on pgsql-performance over a period of almost two years. ● Disregarded all those that were not about query performance problems. ● Decided what I thought the root cause (or, occasionally, causes) of each complaint was. ● Skipped a very small number where I couldn't form an opinion. ● Counted the number of times each problem was reported.
  • 4. Methodology: Possible Critiques ● The problems reported on pgsql-performance aren't necessarily representative of all the problems PostgreSQL users encounter (reporting bias). ● In particular, confusing problems might be more likely to be reported. ● I might not have correctly identified the cause of each problem (researcher bias). ● Others?
  • 5. Statistically Speaking, Why Is My Query Slow? (168) ● Settings (23). Includes anything you can fix with postgresql.conf changes, DDL, or operating systems settings changes. ● Just Plain Slow (23). Includes anything that amounts to an unreasonable expectation on the part of the user. These are often questions of the form “why is query A slower than query B?” when A is actually doing something much more expensive than B. ● We're Bad At That (22). Includes anything that could be faster in some other database product, but isn't fast in PostgreSQL for some reason (not implemented yet, or architectural artifact). ● Planner Error (83). Bad decisions about the cost of one plan vs. another plan due to limitations of the optimizer. ● Bugs (14). Bugs in the query planner, or in one case, the Linux kernel. ● User Error (3). User got confused and did something illogical.
  • 6. Settings (23) ● Planner Cost Constants (8). Adjustments needed to seq_page_cost, random_page_cost, and perhaps cpu_tuple_cost to accurately model real costs. ● Missing Index (4) ● Cost for @@ Operator Is Too Low (2) ● work_mem Too Low (2) ● Statistics Target Too Low (2) ● Statistics Target Too High (1) ● n_distinct Estimates Aren't Accurate On Large Tables (1) ● Not Analyzing Tables Often Enough (1) ● TOAST Decompression is Slow (1) ● vm.zone_reclaim_mode = 1 Causes Extra Disk I/O (1)
  • 7. Just Plain Slow (23) ● It Takes a While to Process a Lot of Data (6) ● Disks Are Slower Than Memory (6) ● Clauses Involving Multiple Tables Can't Be Pushed Down (2) ● Random I/O is Slower Than Sequential I/O (1) ● Linearly Scanning an Array is O(n) (1) ● One Regular Expression is Faster Than Two (1) ● Can't Figure Out Which Patterns Match a String Without Trying Them All (1) ● xmlagg Is Much Slower Than string_agg (1) ● Scanning More Tables is Slower Than Scanning Fewer Tables (1) ● Replanning Isn't Free (1) ● Repeated Concatenation Using xmlconcat Is Slow (1) ● UNION is Slower than UNION ALL (1)
  • 8. We're Bad At That (22) ● Plan Types We Can't Generate (11) ● Parameterized Paths (7). Two of these are post-9.2 complaints, involving cases where 9.2 can't parameterize as needed. ● Merge Append (3). Fixed in 9.1. ● Batched Sort of Data Already Ordered By Leading Columns (1). ● Executor Limitations (3) ● Indexing Unordered Data Causes Random I/O (1) ● <> is Not Indexable (1) ● DISTINCT + HashAggregate Reads All Input Before Emitting Any Results (1). This matters if there is a LIMIT. ● Architecture (8) ● No Parallel Query (2), Table Bloat (1), Backend Startup Cost (1), Redundant Updates Are Expensive (1), AFTER Trigger Queue Size (1), On-Disk Size of numeric (1), Autovacuum Not Smart About Inherited Tables (1)
  • 9. Planner Errors (83) ● Any guesses?
  • 10. Planner Errors (83) ● Conceptual Errors (28). The planner isn't able to recognize that two different queries are equivalent, so it doesn't even consider the best plan. ● Estimation Errors (55). The planner considers the optimal plan, but rejects it as too expensive. ● Row Count Estimation Errors (48). The planner mis-estimates the number of rows that will be returned by some scan, join, or aggregate. ● Cost Estimation Errors (7). The planner estimates the row count correctly but incorrectly estimates the relative cost.
  • 11. Grand Prize Winners ● Selectivity of filter conditions involving correlated columns is estimated inaccurately (13) ● Suppose we want all the rows from a table where a = 1 and b = 1 and c = 1 and d = 1 and e = 1. The planner must estimate the number of rows that will match, but only has statistics on each column individually. ● Planner incorrectly thinks that “SELECT * FROM foo WHERE a = 1 ORDER BY b LIMIT n” will fill the limit after reading a small percentage of the index (11) ● It can scan an index on b and filter for rows where a = 1. ● Or it can scan an index on a, find all rows where a = 1, and perform a top-N sort. ● It often prefers the former when the latter would be faster. ● Can often be worked around with a composite or functional index.
  • 12. Planner Error: Row Count Estimation – Others (24) ● Using WITH Results in a Bad Plan (5). Some of these are query fattening issues, while others result from failure to dig out variable statistics. ● Generic Plans Can Have Wildly Wrong Estimates (4). Improved. ● Selectivity Estimates on Arbitrary Estimates are Poor (4) ● Join Selectivity Doesn't Know about Cross-Table Correlations (3) ● Uncommitted Tuples Don't Affect Statistics (2) ● No Stats for WITH RECURSIVE (1) or GROUP BY (1) Results ● Redundant Equality Constraints Not Identified As Such (1) ● IN/NOT IN Estimation Doesn't Assume Array Elements Distinct (1). Fixed. ● Histogram Bounds Can Slide Due to New Data (1). Fixed. ● Inheritance Parents Aren't Assumed to be Completely Empty (1). Fixed.
  • 13. Planner Error: Cost Estimation (7) ● Planner doesn't account for de-TOASTing cost (4) ● Plan change causes volume of data to exceed server memory (2) ● Hash join sometimes decides to hash the larger table when it should probably be hashing the smaller one (1)
  • 14. Planner Error: Conceptual (28) ● Cross-data type comparisons are not always indexable (3) ● Inlining the same thing multiple times can lose (3) ● NOT IN is hard to optimize – and we don't try very hard (3) ● Target lists are computed too early or unnecessary targets are computed (3) ● Can't rewrite SELECT max(a) FROM foo WHERE b IN (…) as max of index scans (2) ● Can't rearrange joins and aggregates relative to one another (2) ● Can't deduce implied inequalities (2) ● Ten other issues that came up once each
  • 15. Thank You ● Any questions?