2. Overview
Understanding DB2 Optimizer
SQL Coding Strategies & Guidelines
Fliter Factor
Stage1 & Stage 2 Predicates
Explain table
How to interpret the Explain Tables
Using Monitoring Tools to understand the performance of SQLs
BMC Apptune
BMC SQL Explorer
3. SQL Coding Strategies & Guidelines
SQL
Optimized
Access
Path
DB2 Optimizer
Cost - Based
Query
Cost
FormulasDB2
Catalog
Determines database navigation
Parses SQL statements for tables and columns which must be accessed
Queries statistics from DB2 Catalog (populated by RUNSTATS utility)
Determines least expensive access path
Checks Authorization
The DB2 Optimizer is Cost Based and chooses the least expensive access path
4. SQL Coding Strategies & Guidelines
Avoid unnecessary execution of SQL
Consider accomplishing as much as possible with a single call, so as to minimize table
access as far as possible.
Limit the data selected (rows & columns) using SQL and avoid filtering using Application
programs.
As far as possible, Code predicates on Indexable columns
Use equivalent data types for comparison. This avoids the data type conversion overhead.
JOIN tables on Indexed columns.
Avoid Cartesian Products.
The DISTINCT, ORDER BY, GROUP BY, UNION clauses involve a SORT operation. Use
these clauses only if absolutely necessary.
5. SQL Coding Strategies & Guidelines
Cursor Usage Tips
Use Singleton SELECT statements, if you need to retrieve one row only. This
gives a far better performance than cursors.
SELECT …
INTO :<host variables>
Cursors should be used when you have more than one row to be retrieved.
Cursors have the overhead of OPEN, FETCH & CLOSE.
To update rows using a Cursor, use the FOR UPDATE OF clause.
Use FOR FETCH ONLY clause when the cursor is used for data retrieval only.
FOR READ ONLY clause provides the same functionality and is ODBC
compliant.
Use the WITH HOLD clause if you don’t want DB2 to automatically close the
cursor when the application issues a COMMIT statement.
Static Vs Dynamic SQL
The Access paths for Dynamic SQL is determined at run-time, which results in
additional overhead. Also, users need to have direct access to the tables.
The Access paths for Static SQL is determined at bind-time, and reused at run-
time. Users need only the EXECUTE access on the plan.
6. SQL Coding Strategies & Guidelines
UNION and UNION ALL
The OR operator requires Stage 2 processing. Consider rewriting the query as
the union of two SELECT statements, making index access possible
UNION ALL allows duplicates, and hence does not involve a SORT.
The BETWEEN clause
BETWEEN is usually more efficient than using <= and >= operators, except
when comparing a host variable to 2 columns
Stage 2 : WHERE
:hostvar BETWEEN col1 and col2
Stage 1: WHERE
Col1 <= :hostvar AND col2 >= :hostvar
7. SQL Coding Strategies & Guidelines
Use IN Instead of Like
If you know that only a certain number of values exist and can be put in a
list
Use IN or BETWEEN
IN (‘Value1’, ‘Value2’, ‘Value3’)
BETWEEN :valuelow AND :valuehigh
Rather than:
LIKE ‘Value_’
Use LIKE With Care
Avoid the % or the _ at the beginning because it prevents DB2 from using
a matching index and may cause a scan
Use the % or the _ at the end to encourage index usage
8. SQL Coding Strategies & Guidelines
Use NOT operator with care
Predicates formed using NOT (except NOT EXISTS) are Stage 1, but are not
indexable.
For Subquery - when using negation logic:
• Use NOT Exists instead of NOT IN
Code the Most Restrictive Predicate First
After the indexes, place the predicate that will eliminate the greatest number of
rows
Avoid Arithmetic in Predicates
An index is not used for a column when the column is in an arithmetic
expression.
Used at Stage 1 but not indexable
9. SQL Coding Strategies & Guidelines
Nested loop join is efficient when
Outer table is small. Predicates with small filter factor reduces no of qualifying
rows in outer table.
The number of data pages accessed in inner table is also small.
Highly clustered index available on join columns of the inner table.
This join method is efficient when filtering for both the tables (Outer and inner) is
high.
This is the most common Join method.
Merge scan is used when :
Qualifying rows of inner and outer tables are large and join predicates also does
not provide much filtering
Tables are large and have no indexes with matching columns
Hybrid Join is used when:
A non-clustered index available on join column of the inner table and there are
duplicate qualifying rows on outer table.
10. SQL Coding Strategies & Guidelines
Join Types & Join Predicate Considerations
Provide accurate JOIN predicates
Avoid JOIN without a predicate (Cartesian Join)
Join ON indexed columns
Use Joins over sub-queries
When the results of a join must be sorted -
Limiting the ORDER BY to columns of a single table can sometimes avoid a
Sort
Specifying columns from multiple tables definitely involve a Sort
Favor coding LEFT OUTER joins over RIGHT OUTER joins as DB2 always
converts RIGHT joins to LEFT before executing it.
11. SQL Coding Strategies & Guidelines
Sub-Query Guidelines
– If there are efficient indexes available on the tables in the subquery, then a
correlated subquery is likely to be the most efficient kind of subquery.
– If there are no efficient indexes available on the tables in the subquery, then
a non-correlated subquery would likely perform better.
– If there are multiple subqueries in any parent query, make sure that the
subqueries are ordered in the most efficient manner.
12. SQL Coding Strategies & Guidelines
Techniques for Performance Improvement
Use OPTIMIZE OF n ROWS
DB2 assumes that only the said number of rows will be retrieved by
the query before choosing the access path. It is basically like giving a
Hint to the DB2 Optimizer.
This does not stop the user from accessing the entire result set.
This is not useful when DB2 has to gather whole result set before
returning the first n rows.
With this clause, DB2 optimizes the query for quicker response.
Updating catalog tables
If RUNSTATS is costly or it cannot be executed then catalog table
should be updated manually.
Enhanced Techniques for Performance Improvement
13. SQL Coding Strategies & Guidelines
Influencing access path – Add extra Predicate
DB2 evaluates the access path based on information available in
catalog tables
Wrong catalog information or unavailable catalog information may
result in selection of wrong access path
Wrong access path could be because of a wrong index selection or
it could also be of index selection where a tablespace scan is
effective
Code extra predicates or change the predicate to make DB2 use a
different access path
Adding extra predicate may also influence the selection of join
method
If you have extra predicate, Nested loop join may be selected as
DB2 assumes that filter factor will be high. The proper type of
predicate to add is WHERE T1.C1 = T1.C1
Hybrid join is a costlier method. Outer join does not use hybrid join.
So If hybrid join is used by DB2, convert inner join to outer join and
add extra predicates to remove unneeded rows.
Enhanced Techniques for Performance Improvement
14. SQL Coding Strategies & Guidelines
General recommendations
Make sure that
The queries are as simple as possible
Unused rows are not fetched. Filtering to be done by DB2 not in the application
program.
Unused columns are not selected
There is no unnecessary ORDER BY or GROUP BY Clause
Use page level locking and try to minimize lock duration.
Mass updates should be avoided.
Try to use indexable predicates wherever possible
Do not code redundant predicates
Make sure that declared length of the host variable is not greater than length
attribute of data column.
If there are efficient indexes available on the tables in the subquery, co-related
subquery will perform better. Otherwise non co related subquery will perform better.
If there are multiple subqueries, make sure that they are ordered in efficient
manner.
Summary
15. Optimizer assigns a “Filter Factor” (FF) to each predicate or
predicate combination
– Number between 0 and 1 that provides the estimated filtering
percentage
FF of 0.25 means 25% of the rows are estimated to qualify
– Calculated using available statistics from catalog tables
• Column cardinality (COLCARDF)
• HIGH2KEY/LOW2KEY
• Frequency statistics (FREQUENCYF in SYSCOLDIST)
Filter Factor (FF)
16. RUNSTATS
RUNSTATS is a DB2 utility which provides catalog statistics used by the
optimizer and statistics related to the organization of an object
(TS / TB / IX / CO)
Accurate Statistics are a critical factor for performance of the SQL.
Updates the DB2 catalog and reports the statistics.
Some catalog statistics updated by RUNSTATS for use by the optimizer can be
manually updated with appropriate authorization (SYSADM).
19. Stage 1 vs. Stage 2 Predicates
Stage 1 predicates may use an available Index.
Stage 2 predicates cannot use any Index.
20. Wherever possible, prefer to use Stage 1 (Sargable) predicates in the
where clause. These are conditions that can be evaluated in the Data
Manager of DB2, before the results are passed to Relational Data
System (RDS). The more conditions that can be evaluated early on, the
more efficient data retrieval is.
Stage 1-
Refers to DM( Data Manager)
A suitable index must exist!
Reduces I-O from disk and bufferpool activity
Stage 2 -
Refers to RDS ( Relational Data System)
Stage 1 vs. Stage 2 Predicates
21. How does the optimizer calculate Filter Factors?
The lower the filter factor, the lower the cost. In general, the more efficient the
query will be
22. A tool that shows the access path used by a query.
Results of Explain stored in table PLAN_TABLE.
Explain can be run for a query outside a program or for all
queries in a program.
For all queries in a program: By using EXPLAIN(YES) parameter
during BIND.
Sample Explain Table Output
Explain
23. Explain
Explain can be run at bind time using parm value of EXPLAIN(YES)
A PLAN_TABLE must previously exist based on OWNER parm value on BIND or
current SQLID for dynamic SQL
Explain can also be run against dynamic SQL
DELETE FROM PLAN_TABLE WHERE QUERYNO = 999;
EXPLAIN PLAN
SET QUERYNO = 999 FOR
<SELECT STATEMENT GOES HERE - USE ? IN PLACE OF HOST
VARIABLES>;
SELECT * FROM PLAN_TABLE WHERE
QUERYNO = 999 ORDER BY QBLOCKNO, PLANNO;
Don’t forget to Explain everything
Plan_Table is where all the tuning starts
24. Non- Matching Index scan (ACCESSTYPE = I and MATCHCOLS = 0)
Scan all leaf pages of index selected by optimizer selecting one OR more
qualifying rows.
Scan can be with OR without data access.
Predicate does not match Leading columns in the index
SELECT COUNT(*) FROM TABLEA
SELECT MAX(COL1) FROM TABLEA
SELECT COL1 FROM TABLEA WHERE COL2 = :HV
Interpreting the Plan Table/Analyzing Access Paths
26. Matching Index scan (MATCHCOLS > 0)
Scan one or more leaf pages of index selected by optimizer selecting
one OR more qualifying rows. Index match based on one or more key
columns of selected index. Scan can be with OR without data
access.
Predicates matches leading columns of the index.
SELECT COL1 FROM TABLEA WHERE COL2 = :HV
SELECT COL2 FROM TABLEA WHERE COL1 = :HV (host variable
length longer than COL1)
Interpreting the Plan Table/Analyzing Access Paths
27. Root Page
Non-Leaf
Page 1
Non-Leaf
Page 2
Leaf Page 1 Leaf Page 2 Leaf Page 3 Leaf Page 4
Data Page Data Page Data Page Data Page Data Page Data Page Data Page Data Page
Matching Index Scan Diagram
Interpreting the Plan Table/Analyzing Access Paths
28. One Fetch Index Access (ACCESSTYPE = I1)
In certain circumstances can be THE most efficient access path in DB2.
May only need to access only 1 leaf page but MAY need to traverse index tree path.
Requires only one row be retrieved ( Min or Max column function)
SELECT MIN(COL1) FROM TABLEA
SELECT MIN(COL2) FROM TABLEA WHERE COL1 = :HV (will still be I1 BUT with
matchcols = 1)
Interpreting the Plan Table/Analyzing Access Paths
29. IN List Index Scan (ACCESSTYPE = N)
Scan one or more leaf pages of index selected by optimizer selecting one OR more
qualifying rows.
Index match based on one or more key columns of selected index.
At least one key column incorporates an IN list.
SELECT * FROM TABLEA WHERE COL1 = :HV
AND COL2 IN (‘A’,’B’,’C’)
SELECT COL3 FROM TABLEA WHERE COL1 IN (‘12345’,’56789’)
AND COL2 = :HV
Interpreting the Plan Table/Analyzing Access Paths
30. Table-space scan (ACCESSTYPE = R)
Scan against partitioned tablespace or simple tablespace with one table scans all pages
including pages which are empty or contain purely deleted rows.
Scan against simple tablespace containing more than one table includes scanning of
tables within that tablespace not necessarily included in the query.
Scan against segmented tablespace includes only pages containing data.
SELECT * FROM TABLEA
SELECT * FROM TABLEA WHERE COL6 = 0
SELECT * FROM TABLEA WHERE COL1 <> :HV
Interpreting the Plan Table/Analyzing Access Paths
31. Data Page 1 Data Page 2 Data Page 3 Data Page 4
Tablespace Scan Diagram
Interpreting the Plan Table/Analyzing Access Paths
32. DB2 I/O Assisted Mechanisms
Prefetch
To read data ahead in anticipation of its use. Prefetch can read up to 32 4K
pages for applications, and up to 64 4K pages for utilities.
Sequential Prefetch
In DB2 UDB for OS/390, a mechanism that triggers consecutive asynchronous I/O
operations. Pages are fetched before they are required, and several pages are read
with a single I/O operation. This action is determined at bind time and can be detected
by a value of “S” in the prefetch column of the plan table. If index AND data are
required for the SQL, prefetch can occurs both object types.
Dynamic Prefetch
Using the same approach as sequential prefetch, the mechanism is trigger at runtime if
DB2 detect that access to the index and/or data pages is sequential in nature but are
distributed |in a nonconsecutive manner .
List Prefetch
An access method that takes advantage of prefetching even in queries that do not
access data sequentially. This is done by scanning the index and collecting RIDs in
advance of accessing any data pages. These RIDs are then sorted in page number
order, and then data is prefetched using this list.
33. DB2 Explain Columns
QUERY Number –
Identifies the SQL statement in the PLAN_TABLE (any number you assign - the
example uses the numeric part of the userid)
BLOCK –
Query block within the query number, where 1 is the top level SELECT. Subselects,
unions, materialized views, and nested table expressions will show multiple query
blocks. Each QBLOCK has it's own access path.
PLAN –
Indicates the order in which the tables will be accessed
34. DB2 Explain Columns
METHOD –
Shows which JOIN technique was used:
00- First table accessed, continuation of previous table accessed, or not used.
01- Nested Loop Join. For each row of the present composite table, matching rows of a
new table are found and joined
02- Merge Scan Join. The present composite table and the new table are scanned in the
order of the join columns, and matching rows are joined.
03- Sorts needed by ORDER BY, GROUP BY, SELECT DISTINCT, UNION, a quantified
predicate, or an IN predicate. This step does not access a new table.
04- Hybrid Join. The current composite table is scanned in the order of the join-column
rows of the new table. The new table accessed using list prefetch.
35. DB2 Explain Columns
TNAME –
name of the table whose access this row refers to. Either a table in the FROM clause, or
a materialized VIEW name.
TYPE (ACCESS TYPE) –
indicates whether an index was chosen:
I = INDEX
R = TABLESPACE SCAN (reads every data page of the table once)
I1 = ONE-FETCH INDEX SCAN
N = INDEX USING IN LIST
M = MULTIPLE INDEX SCAN
MX = NAMES ONE OF INDEXES USED
MI = INTERSECT MULT. INDEXES
MU = UNION MULT. INDEXES
36. DB2 Explain Columns
MC (MATCHCOLS) - number of columns of matching index scan
ANAME (ACCESS NAME) - name of index
IO (INDEX ONLY) - Y = index alone satisfies data request
N = table must be accessed also
8 Sort Groups: Each sort group has four indicators indicating why the sort is
necessary. Usually, a sort will cause the statement to run longer.
UNIQ - DISTINCT option or UNION was part of the query or IN list for subselect
JOIN - sort for Join
ORDERBY - order by option was part of the query
GROUPBY - group by option was part of the query
37. DB2 Explain Columns
Sort flags for 'new' (inner) tables:
SNU - SORTN_UNIQ - Y = remove duplicates, N = no sort
SNJ - SORTN_JOIN - Y = sort table for join, N = no sort
SNO - SORTN_ORDERBY - Y = sort for order by, N = no sort
SNG - SORTN_GROUPBY - Y = sort for group by, N = no sort
Sort flags for 'composite' (outer) tables:
SCU - SORTC_UNIQ - Y = remove duplicates, N = no sort
SCJ - SORTC_JOIN - Y = sort table for join, N = no sort
SCO - SORTC_ORDERBY - Y = sort for order by, N = no sort
SCG - SORTC_GROUPBY - Y = sort for group by, N = no sort
PF - PREFETCH - Indicates whether data pages were read in advance by prefetch.
S = pure sequential PREFETCH
L = PREFETCH through a RID list
Blank = unknown, or not applicable
38. DB2 Explain Columns
MIXOPSEQ The sequence number of a step in a multiple index operation.
PAGE_RANGE Whether the table qualifies for page range screening, so that plans
scan only the partitions that are needed. Y = Yes; blank = No
COLUMN_FN_EVAL: When an SQL aggregate function is evaluated. R = while the
data is being read from the table or index; S = while performing a sort to satisfy a
GROUP BY clause; blank =after data retrieval and after any sorts.
QBLOCK_TYPE For each query block, an indication of the type of SQL operation
performed.
JOIN_TYPE: The type of join:
F FULL OUTER JOIN
L LEFT OUTER JOIN
S STAR JOIN
blank INNER JOIN or no join
RIGHT OUTER JOIN converts to a LEFT OUTER JOIN when you
use it, so that JOIN_TYPE contains L.
EXPLAIN
Statements with examples.doc
43. Setting Options in BMC APPTUNE
Use
Workload
Analysis
Choose
6. Data source
5. Time interval
44. Viewing Reports in APPTUNE
Use Various
Options To
Generate
Reports
Reports
Generated
for Programs
45. Viewing SQLs in APPTUNE
Use Option S-
To Show
SQLS
Use Option X-
To EXPLAIN
SQLS
46. Example of EXPLAIN Result in BMC APPTUNE
Cost
Calculated
by Optimizer
Matching
Index scan
Performed
Matching
Columns
used by index
Table &
Index names
Used by
access path
52. Use Option 7
- Migrate
Access Path
Statistics
Example of the SQL Tuning Process - Development
Step 1.3: Import Statistics From Production to Development
53. Step 2: Identification of Problem SQL – Identify problem SQL
SQL
Statement
being
Analysed.
Tool warns
that
Cardinality is
missing.
Predicate
Mismatch is
also detected.
Example of the SQL Tuning Process - Development
54. Step 2: Identification of Problem SQL – Check SQL Best Practices
No tool available for
checking Best
Practices. This
needs to be
manually checked
using the SQL Best
Practices document
already Published.
A snippet of the
related Best
Practice from the
SQL Guidelines
document.
Example of the SQL Tuning Process - Development
55. Step 3: SQL Optimization – SQL Rewrite
No tool available to
automatically
rewrite SQL
statements. This
needs to be
manually rewritten
and subsequent
steps for Checking
the new Access
Path to be
performed.
Example of the SQL Tuning Process - Development
56. Step 3: SQL Optimization – Compare Access paths
Access Paths can
be compared.
Notice the change
in Estimated
Indicative cost. A
different Index is
being used now.
Example of the SQL Tuning Process - Development
57. Bibliography
Redbooks at www.redbooks.ibm.com
DB2 UDB for z/OS V8 Everything you ever wanted to know… SG24-6079
DB2 UDB for z/OS V8 Performance Topics SG24-6465
DB2 for z/OS Application Design for High Performance and Availability SG24-7134
10/05
DB2 UDB for Z/OS V8 Application Programming and SQL Guide
SQL Tuning Best Practices & Guidelines Document
In the IM Project & Document Database Process Document section
1) Database 'IM Project and Document Database'
2) Select the ‘Process Document’ Section
3) Select ‘By Process Category’
4) Select ‘Best Practices’
5) View ‘Table of Contents '
6) Select document 'Database Access - SQL Tuning Best Practice & Guidelines’