SlideShare ist ein Scribd-Unternehmen logo
1 von 101
Downloaden Sie, um offline zu lesen
Cardinality Estimation
Estimated vs actual fight to the death.
Before we start, print page 1 of this PDF:
BrentOzar.com/go/engine
99-05: dev, architect, DBA
05-08: DBA, VM, SAN admin
08-10: MCM, Quest Software
Since: consulting DBA
www.BrentOzar.com
Help@BrentOzar.com
Typical weekend errand
Go to Binny’s.
• If they have our champagne, buy it all.
Then go to Whole Foods.
• If you picked up champagne, get a lobster.
• Otherwise, get one pound of salmon.
Bad idea
Go to Binny’s.
• If they have our champagne, buy it all.
Then go to Whole Foods.
• If you picked up champagne,
get one lobster for each bottle of champagne.
• Otherwise, get one pound of salmon.
Execution plans
are like errands.
Typical query errand
Go to the Users table.
• Find the most popular Location.
Go to back to the Users table.
• Get all of the people in that Location
• Sort them by DisplayName
Cardinality estimation is hard.
When SQL Server’s estimates
are reasonably close to actuals,
you’re getting a good plan.
It may not be a fast plan,
but it’ll accurately reflect the amount of work.
(You could reduce work by
changing the query or the indexes.)
When SQL Server’s estimates
are nowhere near actuals (like 100x-1,000x off)
you’re usually getting a bad plan.
Easy cardinality:
GROUP BY Location
Let’s start thinking like the engine again…
This demo uses MAXDOP 1.
It’s not that it’s a good idea.
I just want to keep this execution plan really simple.
Parallelism makes this plan more complex to read.
Don’t worry, the tuning answer isn’t “use more cores.”
Build an execution plan for this.
SELECT Location, COUNT(*)
FROM dbo.Users
GROUP BY Location
ORDER BY COUNT(*) DESC
OPTION (MAXDOP 1);
And tell me how much memory you’ll need
(to keep it simple, how many rows will you handle?)
Let’s see what SQL estimates.
Get the estimated execution plan.
This will collect the estimated execution plan.
You could also press CTRL + L to gather the
estimated execution plan.
Right to left, top to bottom
Read the way the arrows point
Step 1: Clustered index scan
Another way: check the arrows
The arrows coming out of most operators are a
faster/easier way to check row estimations.
Step 2: Hash Match Aggregate
Step 3: Compute Scalar
Step 4: Sort
Step 5: SELECT
Cost
Each statement has a total cost.
A statement’s cost is the
sum of the cost of all operators
in a statement.
Cost is just an arbitrary metric of the estimated cost
to get work done.
We call that Query Bucks.
BrentOzar.com/go/querybucks
BrentOzar.com/go/querybucks
To discover if our estimates are wrong,
we have to run it and get the actual plan.
Getting an actual plan
The only way to get an actual plan is to capture it at
the exact moment when the query finishes, like with:
• SSMS, with actual plans turned on
• Extended Events
• Profiler trace
You can’t get the actual plan from the DMVs after the
query finishes. You’ll see why.
Estimated plan top, actual bottom
Spot the difference between them?
Hover over the
hash match
Estimated plan at left,
actual plan at right.
The actual plan adds
actual row counts.
Every time a query runs,
these can change:
• Warnings
• Row counts
• Spills
• Numbers of executions
• Memory grants
• Wait stats
• Join types (in adaptive plans)
The shape of the est vs actual plan is the
same, but lots of metrics can change.
Step 1: Clustered
Index Scan
Estimates vs actuals are right.
No surprise: it’s just a count of
all the rows in the table, and the
table contents aren’t changing.
Arrows: handy shortcuts
Hover the mouse over the arrows to see mismatches.
Doesn’t show actual data size, though.
Step 2:
Hash match
aggregate
Now the estimate starts to matter:
SQL Server needs memory to do this, and
it only estimates enough memory to
handle the estimated number of rows.
When it runs out, it spills to TempDB.
How many Locations came out?
We have 2 problems here:
• Our estimates were wrong
• Which means our memory estimates are wrong
for the rest of the plan, too
How accurate were
your own estimates?
SQL Server did the best it could.
There’s no magic here.
No person or system could possibly get this estimate
right without changing something about the database,
like storing the number of locations.
More on that in a minute.
Step 3: Compute Scalar
Step 4:
Sort
Our bad estimates
strike again: we didn’t
estimate enough
memory to do the sort.
So now we spill the
data to disk AGAIN.
Oops.
Estimation accuracy
matters a lot.
Step 5: SELECT
Every time the query runs…
We’re going to underestimate row counts.
We’re going to underallocate memory.
We’re going to write to TempDB.
We’re going to burn more CPU time sorting.
So how do we fix it?
Updating the existing statistics
OPTION (RECOMPILE)
Switching back to the 2012 Cardinality Estimator
Using 2017’s Adaptive Memory Grants
Adding a statistic (on what columns?)
Adding an index (on what columns?)
Does updating stats help?
No – we don’t have stats on the Location column.
We can update our brains out,
but nothing will change.
OPTION (RECOMPILE)?
No, that doesn’t help either:
we built a fresh plan for this, and it was still wrong.
The Cardinality Estimator (CE)
The SQL Server engine code that guesses how many
rows will come back from operations
SQL Server 2005-2012: stayed the same
SQL Server 2014+: new CE available if you set your
database compatibility level to 2014 or higher
No winners here – they both lose.
Compatibility level 2012:
Compat 2014/16/17:
The new CE was just new.
It’s not magic.
It’s not even new anymore.
SQL 2017 helps – kinda.
Adaptive Memory Grants: SQL Server tracks when
this happens, and starts adjusting memory grants
over time.
Unfortunately, right now it only works in batch mode:
meaning, we need a columnstore index in the query.
That’s not the right answer to this tuning problem.
Will adding a statistic help?
CREATE STATISTICS Stat_Location
ON dbo.Users(Location);
Will adding a statistic help?
CREATE STATISTICS Stat_Location
ON dbo.Users(Location);
Even if it does help, you still have to sort every time:
Will adding an index help?
CREATE INDEX IX_Location
ON dbo.Users(Location);
Will adding an index help?
CREATE INDEX IX_Location
ON dbo.Users(Location);
Getting better:
And check out those estimates.
We know how many rows will come out
Rows are sorted: no hash match, now stream aggregate
We still have to sort by most popular locations though
Sort: still spills!
Look at estimated vs actual.
We knew we’d get 110k rows.
But SQL Server still didn’t
allocate enough memory.
Query tuning involves
Each plan operator tries to predict:
• How much data will come in
• How much resources we’ll need in order to
perform the required work in this operator
• How much data will go out to the next operator
Your job as a query tuner:
figure out where those are going wrong, and make
changes to make SQL Server’s life easier.
When SQL Server’s estimates
are reasonably close to actuals,
you’re getting a good plan.
It may not be a fast plan,
but it’ll accurately reflect the amount of work.
(You could reduce work by
changing the query or the indexes.)
Query tuning involves
Each plan operator tries to predict:
• How much data will come in
• How much resources we’ll need in order to
perform the required work in this operator
• How much data will go out to the next operator
Your job as a query tuner:
figure out where those are going wrong, and make
changes to make SQL Server’s life easier.
Changed with the index
Query tuning involves
Each plan operator tries to predict:
• How much data will come in
• How much resources we’ll need in order to
perform the required work in this operator
• How much data will go out to the next operator
Your job as a query tuner:
figure out where those are going wrong, and make
changes to make SQL Server’s life easier.
We can change this too
Here’s the query again.
SELECT Location, COUNT(*)
FROM dbo.Users
GROUP BY Location
ORDER BY COUNT(*) DESC
OPTION (MAXDOP 1);
Do we really need all locations every time this runs?
Or can we paginate the data in our application?
“Buy it all” is a bad errand.
Go to Binny’s.
• If they have our champagne, buy it all.
Then go to Whole Foods.
• If you picked up champagne, get a lobster.
• Otherwise, get one pound of salmon.
Let’s try just the top 100.
SELECT TOP 100 Location, COUNT(*)
FROM dbo.Users
GROUP BY Location
ORDER BY COUNT(*) DESC
OPTION (MAXDOP 1);
No spills on the sort.
And now, the only yellow bang
is on the SELECT.
SQL Server is complaining
that we granted TOO MUCH
memory – and it’s just 1MB.
The sorts are different.
Sorting all rows, and keeping them all
Sorting all rows, but only keeping N of them
And the sort algorithm even changes
depending on what N is!
https://www.brentozar.com/archive/2017/09/mu
ch-can-one-row-change-query-plan-part-1/
What we learned so far
To get accurate estimates, the Cardinality Estimator (CE)
needs statistics.
Indexes get you those statistics.
You can’t just tune the queries in isolation:
the right indexes are a required foundation.
Even when estimates are completely accurate, you still may
not get the memory you might want to join/sort everything
in memory. (Especially for large real-world loads.)
Now let’s do a
2-part errand.
A one, and a two
Queries usually multiply data.
Go to Binny’s.
• If they have our champagne, buy it all.
Then go to Whole Foods.
• If you picked up champagne,
get one lobster for each bottle of champagne.
• Otherwise, get one pound of salmon.
Build an execution plan for this.
DECLARE @TopLocation NVARCHAR(100);
SELECT TOP 1 @TopLocation = Location
FROM dbo.Users
WHERE Location <> ''
GROUP BY Location
ORDER BY COUNT(*) DESC;
SELECT * FROM dbo.Users
WHERE Location = @TopLocation
ORDER BY DisplayName;
Build an execution plan for this.
DECLARE @TopLocation NVARCHAR(100);
SELECT TOP 1 @TopLocation = Location
FROM dbo.Users
WHERE Location <> ''
GROUP BY Location
ORDER BY COUNT(*) DESC;
SELECT * FROM dbo.Users
WHERE Location = @TopLocation
ORDER BY DisplayName;
Variables only store 1 row.
Build an execution plan for this.
DECLARE @TopLocation NVARCHAR(100);
SELECT TOP 1 @TopLocation = Location
FROM dbo.Users
WHERE Location <> ''
GROUP BY Location
ORDER BY COUNT(*) DESC;
SELECT * FROM dbo.Users
WHERE Location = @TopLocation
ORDER BY DisplayName;
We’re only going to put 1 value in @TopLocation.
If we thought really hard, we could possibly even
use statistics to predict what that value might be.
Build an execution plan for this.
DECLARE @TopLocation NVARCHAR(100);
SELECT TOP 1 @TopLocation = Location
FROM dbo.Users
WHERE Location <> ''
GROUP BY Location
ORDER BY COUNT(*) DESC;
SELECT * FROM dbo.Users
WHERE Location = @TopLocation
ORDER BY DisplayName;
But now the SELECT runs.
How can it predict how many rows will return?
The whole plan is built at once.
DECLARE @TopLocation NVARCHAR(100);
SELECT TOP 1 @TopLocation = Location
FROM dbo.Users
WHERE Location <> ''
GROUP BY Location
ORDER BY COUNT(*) DESC;
SELECT * FROM dbo.Users
WHERE Location = @TopLocation
ORDER BY DisplayName;
Estimates are built at once, too.
Of course the estimates are wrong.
Boo, hiss
SQL Server compiles batches.
It has to build an execution plan for the entire batch all
at the same time.
A stored procedure is a big batch.
You can put OPTION (RECOMPILE) on statements,
forcing SQL Server to build a new execution plan
given what it knows so far.
New plan for this one statement
A different plan.
At first, this might worse: a clustered index scan of the entire Users table,
ignoring the Locations index. But it actually does less logical reads
because we’re dealing with a lot of Users. And note – no spill on the sort.
It’s asking for a missing index.
But it includes EVERY SINGLE FIELD.
Over here in reality, we can’t usually create indexes
like that. (If you can, great, do it – but notice AboutMe
and its horrible datatype.)
What OPTION (RECOMPILE) does
Forces SQL Server to stop and build a new plan
Takes effect at the level where you put it
Here, we’re recompiling a single statement because
what comes out of the prior query changes
EVERYTHING we do:
• The index we use, and the way we use it
• How much memory we need
Using recompile hints
They help when SQL Server needs to reset
expectations about what it’s about to do
But you have to know where to put them
Typically best used when:
• You’re doing multi-step processing
• The amount of data varies WIDELY
• You can identify the point where things change
dramatically, and stick the hint there
Possible fix:
Combine queries
I got rid of the variable.
Former query #1
Former query #2
But the plan is built all at once.
SQL Server knows how many rows
the CTE will produce: just 1,
SELECT TOP 1.
But it has no idea what the
location will be.
It doesn’t execute the CTE first,
get the location, and then execute
the query.
This whole thing is done at once.
Read the plan right to left.
SQL Server’s doing the same thing it did before in the 2-query
process: first, it builds the list of most popular locations, takes
the top 1, and then looks up the users in that location.
Is this estimate going to be right?
Do we know how many locations we’ll find?
Scan the list of locations
Is this estimate going to be right?
Do we know how many locations the Sort will push out?
Sort them by COUNT(*)
This one is a little trickier.
We’re going to seek to that Location name. But 2 questions:
1. How many times are we going to seek to a Location?
2. How many rows are going to come out of the seek?
Seek to that Location name
From “Think Like the Engine”
A “seek” sounds like it’s only going to return 1 row.
A “scan” sounds like it returns the whole table.
But here’s what they really mean:
Seek = start reading at one specific location
Scan = start reading at either end of the object
Decoding it
Seek predicate:
jump to the Location that we found
in the earlier operations, and start
reading there.
We just don’t know what that
Location will be when we build the
plan.
We can’t possibly know.
Estimated rows
The estimate is just garbage.
SQL Server’s using the density vector
(more on that in Think Like the Engine
and Mastering Index Tuning.)
It’s guessing based on how many rows
the average Location has.
But we’re asking for the biggest one.
Actual rows
Estimates: 13
Actual: 37,810
These are way off, and it has a
cascading effect on the rest of
the plan.
This one is a little trickier.
So now, as we move through the rest of the plan, we’re going to
have issues.
Est 13, actual 37,810
This one is a little trickier.
For each User.Id we found, go get the rest of the fields that
weren’t included in our Location index. Again, 2 questions:
1. How many times are we going to do this key lookup?
2. How many rows are going to come out of each one?
Get the SELECT *
The entire popup
I’ll zoom in on this one part…
The important part
Dammit, Beavis
FOR THE LOVE OF ALL
THAT’S HOLY CAN
YOU PLEASE PUT THE
FIELDS IN SOME KIND
OF ORDER AND BE
CONSISTENT ABOUT
WHETHER YOU
PREFIX THINGS WITH
ACTUAL OR NOT
Decoding it
The label The value What it means
Estimated Number of Executions 13
Number of Executions 37,810 Which is bad because each one
produces a few logical reads, so we
read more pages than the entire table
Estimated Number of Rows 1 The number of rows that will come out
of EACH OPERATION, which is really
misleading
Actual Number of Rows 37,810 The total number of rows that came out
of ALL THE OPERATIONS
Is this estimate going to be right?
The Sort needed to estimate how many rows it’d be dealing
with, which affects our overall memory grant.
Uh no
This CTE is a great example.
Even in a small query like this:
• SQL Server processes data in order, in steps
• Can kinda be thought of as a stored procedure
• Estimates can go wrong in any step
Our job:
• Read the plan right to left, top to bottom
• Understand where estimates are going wrong
• Help SQL Server make better estimations
Whew.
What we learned
Cardinality estimation involves:
• Predicting how many rows will come back
• Guessing the contents of those rows to predict how
many rows will come back from other operations
Lots of ways to accomplish it with varying success:
• Updating the existing statistics
• OPTION (RECOMPILE)
• Switching back to the 2012 Cardinality Estimator
• 2017’s Adaptive Query Processing
• Changing the database (indexes, stats)
• Rewriting queries to break them up, or combine them
Wanna learn more?
Mastering Index Tuning
Mastering Query Tuning
Mastering Server Tuning
PASS Summit Pre-Con
BrentOzar.com/training
Getting Better Query Plans by Improving SQL's Estimates

Weitere ähnliche Inhalte

Mehr von Brent Ozar

Dynamic SQL: How to Build Fast Multi-Parameter Stored Procedures
Dynamic SQL: How to Build Fast Multi-Parameter Stored ProceduresDynamic SQL: How to Build Fast Multi-Parameter Stored Procedures
Dynamic SQL: How to Build Fast Multi-Parameter Stored ProceduresBrent Ozar
 
Headaches of Blocking, Locking, and Deadlocking
Headaches of Blocking, Locking, and DeadlockingHeadaches of Blocking, Locking, and Deadlocking
Headaches of Blocking, Locking, and DeadlockingBrent Ozar
 
"But It Worked In Development!" - 3 Hard SQL Server Problems
"But It Worked In Development!" - 3 Hard SQL Server Problems"But It Worked In Development!" - 3 Hard SQL Server Problems
"But It Worked In Development!" - 3 Hard SQL Server ProblemsBrent Ozar
 
Columnstore Customer Stories 2016 by Sunil Agarwal
Columnstore Customer Stories 2016 by Sunil AgarwalColumnstore Customer Stories 2016 by Sunil Agarwal
Columnstore Customer Stories 2016 by Sunil AgarwalBrent Ozar
 
500-Level Guide to Career Internals
500-Level Guide to Career Internals500-Level Guide to Career Internals
500-Level Guide to Career InternalsBrent Ozar
 
Introduction to SQL Server Internals: How to Think Like the Engine
Introduction to SQL Server Internals: How to Think Like the EngineIntroduction to SQL Server Internals: How to Think Like the Engine
Introduction to SQL Server Internals: How to Think Like the EngineBrent Ozar
 
Building a Fast, Reliable SQL Server for kCura Relativity
Building a Fast, Reliable SQL Server for kCura RelativityBuilding a Fast, Reliable SQL Server for kCura Relativity
Building a Fast, Reliable SQL Server for kCura RelativityBrent Ozar
 
How to Make SQL Server Go Faster
How to Make SQL Server Go FasterHow to Make SQL Server Go Faster
How to Make SQL Server Go FasterBrent Ozar
 
500-Level Guide to Career Internals
500-Level Guide to Career Internals500-Level Guide to Career Internals
500-Level Guide to Career InternalsBrent Ozar
 
What I Learned About SQL Server at Ignite 2015
What I Learned About SQL Server at Ignite 2015What I Learned About SQL Server at Ignite 2015
What I Learned About SQL Server at Ignite 2015Brent Ozar
 

Mehr von Brent Ozar (10)

Dynamic SQL: How to Build Fast Multi-Parameter Stored Procedures
Dynamic SQL: How to Build Fast Multi-Parameter Stored ProceduresDynamic SQL: How to Build Fast Multi-Parameter Stored Procedures
Dynamic SQL: How to Build Fast Multi-Parameter Stored Procedures
 
Headaches of Blocking, Locking, and Deadlocking
Headaches of Blocking, Locking, and DeadlockingHeadaches of Blocking, Locking, and Deadlocking
Headaches of Blocking, Locking, and Deadlocking
 
"But It Worked In Development!" - 3 Hard SQL Server Problems
"But It Worked In Development!" - 3 Hard SQL Server Problems"But It Worked In Development!" - 3 Hard SQL Server Problems
"But It Worked In Development!" - 3 Hard SQL Server Problems
 
Columnstore Customer Stories 2016 by Sunil Agarwal
Columnstore Customer Stories 2016 by Sunil AgarwalColumnstore Customer Stories 2016 by Sunil Agarwal
Columnstore Customer Stories 2016 by Sunil Agarwal
 
500-Level Guide to Career Internals
500-Level Guide to Career Internals500-Level Guide to Career Internals
500-Level Guide to Career Internals
 
Introduction to SQL Server Internals: How to Think Like the Engine
Introduction to SQL Server Internals: How to Think Like the EngineIntroduction to SQL Server Internals: How to Think Like the Engine
Introduction to SQL Server Internals: How to Think Like the Engine
 
Building a Fast, Reliable SQL Server for kCura Relativity
Building a Fast, Reliable SQL Server for kCura RelativityBuilding a Fast, Reliable SQL Server for kCura Relativity
Building a Fast, Reliable SQL Server for kCura Relativity
 
How to Make SQL Server Go Faster
How to Make SQL Server Go FasterHow to Make SQL Server Go Faster
How to Make SQL Server Go Faster
 
500-Level Guide to Career Internals
500-Level Guide to Career Internals500-Level Guide to Career Internals
500-Level Guide to Career Internals
 
What I Learned About SQL Server at Ignite 2015
What I Learned About SQL Server at Ignite 2015What I Learned About SQL Server at Ignite 2015
What I Learned About SQL Server at Ignite 2015
 

Kürzlich hochgeladen

%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...masabamasaba
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in sowetomasabamasaba
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park masabamasaba
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...masabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationJuha-Pekka Tolvanen
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 

Kürzlich hochgeladen (20)

%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 

Getting Better Query Plans by Improving SQL's Estimates

  • 1. Cardinality Estimation Estimated vs actual fight to the death. Before we start, print page 1 of this PDF: BrentOzar.com/go/engine
  • 2. 99-05: dev, architect, DBA 05-08: DBA, VM, SAN admin 08-10: MCM, Quest Software Since: consulting DBA www.BrentOzar.com Help@BrentOzar.com
  • 3.
  • 4.
  • 5. Typical weekend errand Go to Binny’s. • If they have our champagne, buy it all. Then go to Whole Foods. • If you picked up champagne, get a lobster. • Otherwise, get one pound of salmon.
  • 6. Bad idea Go to Binny’s. • If they have our champagne, buy it all. Then go to Whole Foods. • If you picked up champagne, get one lobster for each bottle of champagne. • Otherwise, get one pound of salmon.
  • 8. Typical query errand Go to the Users table. • Find the most popular Location. Go to back to the Users table. • Get all of the people in that Location • Sort them by DisplayName
  • 10. When SQL Server’s estimates are reasonably close to actuals, you’re getting a good plan. It may not be a fast plan, but it’ll accurately reflect the amount of work. (You could reduce work by changing the query or the indexes.)
  • 11. When SQL Server’s estimates are nowhere near actuals (like 100x-1,000x off) you’re usually getting a bad plan.
  • 12. Easy cardinality: GROUP BY Location Let’s start thinking like the engine again…
  • 13. This demo uses MAXDOP 1. It’s not that it’s a good idea. I just want to keep this execution plan really simple. Parallelism makes this plan more complex to read. Don’t worry, the tuning answer isn’t “use more cores.”
  • 14. Build an execution plan for this. SELECT Location, COUNT(*) FROM dbo.Users GROUP BY Location ORDER BY COUNT(*) DESC OPTION (MAXDOP 1); And tell me how much memory you’ll need (to keep it simple, how many rows will you handle?)
  • 15. Let’s see what SQL estimates. Get the estimated execution plan. This will collect the estimated execution plan. You could also press CTRL + L to gather the estimated execution plan.
  • 16. Right to left, top to bottom Read the way the arrows point
  • 17. Step 1: Clustered index scan
  • 18. Another way: check the arrows The arrows coming out of most operators are a faster/easier way to check row estimations.
  • 19. Step 2: Hash Match Aggregate
  • 20. Step 3: Compute Scalar
  • 23. Cost Each statement has a total cost. A statement’s cost is the sum of the cost of all operators in a statement. Cost is just an arbitrary metric of the estimated cost to get work done. We call that Query Bucks.
  • 26. To discover if our estimates are wrong, we have to run it and get the actual plan.
  • 27. Getting an actual plan The only way to get an actual plan is to capture it at the exact moment when the query finishes, like with: • SSMS, with actual plans turned on • Extended Events • Profiler trace You can’t get the actual plan from the DMVs after the query finishes. You’ll see why.
  • 28. Estimated plan top, actual bottom Spot the difference between them?
  • 29. Hover over the hash match Estimated plan at left, actual plan at right. The actual plan adds actual row counts.
  • 30. Every time a query runs, these can change: • Warnings • Row counts • Spills • Numbers of executions • Memory grants • Wait stats • Join types (in adaptive plans) The shape of the est vs actual plan is the same, but lots of metrics can change.
  • 31. Step 1: Clustered Index Scan Estimates vs actuals are right. No surprise: it’s just a count of all the rows in the table, and the table contents aren’t changing.
  • 32. Arrows: handy shortcuts Hover the mouse over the arrows to see mismatches. Doesn’t show actual data size, though.
  • 33. Step 2: Hash match aggregate Now the estimate starts to matter: SQL Server needs memory to do this, and it only estimates enough memory to handle the estimated number of rows. When it runs out, it spills to TempDB.
  • 34. How many Locations came out? We have 2 problems here: • Our estimates were wrong • Which means our memory estimates are wrong for the rest of the plan, too
  • 35. How accurate were your own estimates?
  • 36. SQL Server did the best it could. There’s no magic here. No person or system could possibly get this estimate right without changing something about the database, like storing the number of locations. More on that in a minute.
  • 37. Step 3: Compute Scalar
  • 38. Step 4: Sort Our bad estimates strike again: we didn’t estimate enough memory to do the sort. So now we spill the data to disk AGAIN.
  • 41. Every time the query runs… We’re going to underestimate row counts. We’re going to underallocate memory. We’re going to write to TempDB. We’re going to burn more CPU time sorting.
  • 42. So how do we fix it? Updating the existing statistics OPTION (RECOMPILE) Switching back to the 2012 Cardinality Estimator Using 2017’s Adaptive Memory Grants Adding a statistic (on what columns?) Adding an index (on what columns?)
  • 43. Does updating stats help? No – we don’t have stats on the Location column. We can update our brains out, but nothing will change.
  • 44. OPTION (RECOMPILE)? No, that doesn’t help either: we built a fresh plan for this, and it was still wrong.
  • 45. The Cardinality Estimator (CE) The SQL Server engine code that guesses how many rows will come back from operations SQL Server 2005-2012: stayed the same SQL Server 2014+: new CE available if you set your database compatibility level to 2014 or higher
  • 46. No winners here – they both lose. Compatibility level 2012: Compat 2014/16/17:
  • 47. The new CE was just new. It’s not magic. It’s not even new anymore.
  • 48. SQL 2017 helps – kinda. Adaptive Memory Grants: SQL Server tracks when this happens, and starts adjusting memory grants over time. Unfortunately, right now it only works in batch mode: meaning, we need a columnstore index in the query. That’s not the right answer to this tuning problem.
  • 49. Will adding a statistic help? CREATE STATISTICS Stat_Location ON dbo.Users(Location);
  • 50. Will adding a statistic help? CREATE STATISTICS Stat_Location ON dbo.Users(Location); Even if it does help, you still have to sort every time:
  • 51. Will adding an index help? CREATE INDEX IX_Location ON dbo.Users(Location);
  • 52. Will adding an index help? CREATE INDEX IX_Location ON dbo.Users(Location); Getting better:
  • 53. And check out those estimates. We know how many rows will come out Rows are sorted: no hash match, now stream aggregate We still have to sort by most popular locations though
  • 54. Sort: still spills! Look at estimated vs actual. We knew we’d get 110k rows. But SQL Server still didn’t allocate enough memory.
  • 55. Query tuning involves Each plan operator tries to predict: • How much data will come in • How much resources we’ll need in order to perform the required work in this operator • How much data will go out to the next operator Your job as a query tuner: figure out where those are going wrong, and make changes to make SQL Server’s life easier.
  • 56. When SQL Server’s estimates are reasonably close to actuals, you’re getting a good plan. It may not be a fast plan, but it’ll accurately reflect the amount of work. (You could reduce work by changing the query or the indexes.)
  • 57. Query tuning involves Each plan operator tries to predict: • How much data will come in • How much resources we’ll need in order to perform the required work in this operator • How much data will go out to the next operator Your job as a query tuner: figure out where those are going wrong, and make changes to make SQL Server’s life easier. Changed with the index
  • 58. Query tuning involves Each plan operator tries to predict: • How much data will come in • How much resources we’ll need in order to perform the required work in this operator • How much data will go out to the next operator Your job as a query tuner: figure out where those are going wrong, and make changes to make SQL Server’s life easier. We can change this too
  • 59. Here’s the query again. SELECT Location, COUNT(*) FROM dbo.Users GROUP BY Location ORDER BY COUNT(*) DESC OPTION (MAXDOP 1); Do we really need all locations every time this runs? Or can we paginate the data in our application?
  • 60. “Buy it all” is a bad errand. Go to Binny’s. • If they have our champagne, buy it all. Then go to Whole Foods. • If you picked up champagne, get a lobster. • Otherwise, get one pound of salmon.
  • 61. Let’s try just the top 100. SELECT TOP 100 Location, COUNT(*) FROM dbo.Users GROUP BY Location ORDER BY COUNT(*) DESC OPTION (MAXDOP 1);
  • 62. No spills on the sort. And now, the only yellow bang is on the SELECT. SQL Server is complaining that we granted TOO MUCH memory – and it’s just 1MB.
  • 63. The sorts are different. Sorting all rows, and keeping them all Sorting all rows, but only keeping N of them And the sort algorithm even changes depending on what N is! https://www.brentozar.com/archive/2017/09/mu ch-can-one-row-change-query-plan-part-1/
  • 64. What we learned so far To get accurate estimates, the Cardinality Estimator (CE) needs statistics. Indexes get you those statistics. You can’t just tune the queries in isolation: the right indexes are a required foundation. Even when estimates are completely accurate, you still may not get the memory you might want to join/sort everything in memory. (Especially for large real-world loads.)
  • 65. Now let’s do a 2-part errand. A one, and a two
  • 66. Queries usually multiply data. Go to Binny’s. • If they have our champagne, buy it all. Then go to Whole Foods. • If you picked up champagne, get one lobster for each bottle of champagne. • Otherwise, get one pound of salmon.
  • 67. Build an execution plan for this. DECLARE @TopLocation NVARCHAR(100); SELECT TOP 1 @TopLocation = Location FROM dbo.Users WHERE Location <> '' GROUP BY Location ORDER BY COUNT(*) DESC; SELECT * FROM dbo.Users WHERE Location = @TopLocation ORDER BY DisplayName;
  • 68. Build an execution plan for this. DECLARE @TopLocation NVARCHAR(100); SELECT TOP 1 @TopLocation = Location FROM dbo.Users WHERE Location <> '' GROUP BY Location ORDER BY COUNT(*) DESC; SELECT * FROM dbo.Users WHERE Location = @TopLocation ORDER BY DisplayName; Variables only store 1 row.
  • 69. Build an execution plan for this. DECLARE @TopLocation NVARCHAR(100); SELECT TOP 1 @TopLocation = Location FROM dbo.Users WHERE Location <> '' GROUP BY Location ORDER BY COUNT(*) DESC; SELECT * FROM dbo.Users WHERE Location = @TopLocation ORDER BY DisplayName; We’re only going to put 1 value in @TopLocation. If we thought really hard, we could possibly even use statistics to predict what that value might be.
  • 70. Build an execution plan for this. DECLARE @TopLocation NVARCHAR(100); SELECT TOP 1 @TopLocation = Location FROM dbo.Users WHERE Location <> '' GROUP BY Location ORDER BY COUNT(*) DESC; SELECT * FROM dbo.Users WHERE Location = @TopLocation ORDER BY DisplayName; But now the SELECT runs. How can it predict how many rows will return?
  • 71. The whole plan is built at once. DECLARE @TopLocation NVARCHAR(100); SELECT TOP 1 @TopLocation = Location FROM dbo.Users WHERE Location <> '' GROUP BY Location ORDER BY COUNT(*) DESC; SELECT * FROM dbo.Users WHERE Location = @TopLocation ORDER BY DisplayName;
  • 72. Estimates are built at once, too.
  • 73. Of course the estimates are wrong. Boo, hiss
  • 74. SQL Server compiles batches. It has to build an execution plan for the entire batch all at the same time. A stored procedure is a big batch. You can put OPTION (RECOMPILE) on statements, forcing SQL Server to build a new execution plan given what it knows so far.
  • 75. New plan for this one statement
  • 76. A different plan. At first, this might worse: a clustered index scan of the entire Users table, ignoring the Locations index. But it actually does less logical reads because we’re dealing with a lot of Users. And note – no spill on the sort.
  • 77. It’s asking for a missing index. But it includes EVERY SINGLE FIELD. Over here in reality, we can’t usually create indexes like that. (If you can, great, do it – but notice AboutMe and its horrible datatype.)
  • 78. What OPTION (RECOMPILE) does Forces SQL Server to stop and build a new plan Takes effect at the level where you put it Here, we’re recompiling a single statement because what comes out of the prior query changes EVERYTHING we do: • The index we use, and the way we use it • How much memory we need
  • 79. Using recompile hints They help when SQL Server needs to reset expectations about what it’s about to do But you have to know where to put them Typically best used when: • You’re doing multi-step processing • The amount of data varies WIDELY • You can identify the point where things change dramatically, and stick the hint there
  • 81. I got rid of the variable. Former query #1 Former query #2
  • 82. But the plan is built all at once. SQL Server knows how many rows the CTE will produce: just 1, SELECT TOP 1. But it has no idea what the location will be. It doesn’t execute the CTE first, get the location, and then execute the query. This whole thing is done at once.
  • 83. Read the plan right to left. SQL Server’s doing the same thing it did before in the 2-query process: first, it builds the list of most popular locations, takes the top 1, and then looks up the users in that location.
  • 84. Is this estimate going to be right? Do we know how many locations we’ll find? Scan the list of locations
  • 85. Is this estimate going to be right? Do we know how many locations the Sort will push out? Sort them by COUNT(*)
  • 86. This one is a little trickier. We’re going to seek to that Location name. But 2 questions: 1. How many times are we going to seek to a Location? 2. How many rows are going to come out of the seek? Seek to that Location name
  • 87. From “Think Like the Engine” A “seek” sounds like it’s only going to return 1 row. A “scan” sounds like it returns the whole table. But here’s what they really mean: Seek = start reading at one specific location Scan = start reading at either end of the object
  • 88. Decoding it Seek predicate: jump to the Location that we found in the earlier operations, and start reading there. We just don’t know what that Location will be when we build the plan. We can’t possibly know.
  • 89. Estimated rows The estimate is just garbage. SQL Server’s using the density vector (more on that in Think Like the Engine and Mastering Index Tuning.) It’s guessing based on how many rows the average Location has. But we’re asking for the biggest one.
  • 90. Actual rows Estimates: 13 Actual: 37,810 These are way off, and it has a cascading effect on the rest of the plan.
  • 91. This one is a little trickier. So now, as we move through the rest of the plan, we’re going to have issues. Est 13, actual 37,810
  • 92. This one is a little trickier. For each User.Id we found, go get the rest of the fields that weren’t included in our Location index. Again, 2 questions: 1. How many times are we going to do this key lookup? 2. How many rows are going to come out of each one? Get the SELECT *
  • 93. The entire popup I’ll zoom in on this one part… The important part
  • 94. Dammit, Beavis FOR THE LOVE OF ALL THAT’S HOLY CAN YOU PLEASE PUT THE FIELDS IN SOME KIND OF ORDER AND BE CONSISTENT ABOUT WHETHER YOU PREFIX THINGS WITH ACTUAL OR NOT
  • 95. Decoding it The label The value What it means Estimated Number of Executions 13 Number of Executions 37,810 Which is bad because each one produces a few logical reads, so we read more pages than the entire table Estimated Number of Rows 1 The number of rows that will come out of EACH OPERATION, which is really misleading Actual Number of Rows 37,810 The total number of rows that came out of ALL THE OPERATIONS
  • 96. Is this estimate going to be right? The Sort needed to estimate how many rows it’d be dealing with, which affects our overall memory grant. Uh no
  • 97. This CTE is a great example. Even in a small query like this: • SQL Server processes data in order, in steps • Can kinda be thought of as a stored procedure • Estimates can go wrong in any step Our job: • Read the plan right to left, top to bottom • Understand where estimates are going wrong • Help SQL Server make better estimations
  • 98. Whew.
  • 99. What we learned Cardinality estimation involves: • Predicting how many rows will come back • Guessing the contents of those rows to predict how many rows will come back from other operations Lots of ways to accomplish it with varying success: • Updating the existing statistics • OPTION (RECOMPILE) • Switching back to the 2012 Cardinality Estimator • 2017’s Adaptive Query Processing • Changing the database (indexes, stats) • Rewriting queries to break them up, or combine them
  • 100. Wanna learn more? Mastering Index Tuning Mastering Query Tuning Mastering Server Tuning PASS Summit Pre-Con BrentOzar.com/training