Weitere ähnliche Inhalte
Kürzlich hochgeladen (20)
Due Diligence with Exadata
- 1. Due Diligence
Examining Exadata
Jonathan Lewis
jonathanlewis.wordpress.com
www.jlcomp.demon.co.uk
Who am I ?
Independent Consultant
28+ years in IT
24+ using Oracle
Strategy, Design, Review,
Briefings, Educational,
Trouble-shooting
Member of the Oak Table Network
Oracle ACE Director
Oracle author of the year 2006
Select Editor’s choice 2007
UKOUG Inspiring Presenter 2011
UKOUG Council member 2012
ODTUG 2012 Best Presenter (d/b)
O1 visa for USA
Jonathan Lewis Examining Exadata
© 2012 2 / 36
1
- 2. Small print (a)
Jonathan Lewis Examining Exadata
© 2012 3 / 36
Small print (b)
Jonathan Lewis Examining Exadata
© 2012 4 / 36
2
- 3. Small print (c)
Jonathan Lewis Examining Exadata
© 2012 5 / 36
Why choose Exadata?
• Political
• Economic
• Technical
Jonathan Lewis Examining Exadata
© 2012 6 / 36
3
- 4. Political
• Single Supplier
– No finger-pointing (+)
– Stranglehold (-)
• Single point of management
– No finger-pointing (+)
– No "lost" databases (+)
• What about needs for different versions (-)
Jonathan Lewis Examining Exadata
© 2012 7 / 36
Economic (a)
• Black Box
– Single supplier, pre-installed (+)
• No build time
– What about upgrades / patches (-)
– It's "just" Oracle (+)
• but not as we know it, Jim (-)
Jonathan Lewis Examining Exadata
© 2012 8 / 36
4
- 5. Economic (b)
• Bang per buck
– Quantity of hardware
– Licence costs for "matching" system
– Special features (cp SAN vs. JBOD)
Jonathan Lewis Examining Exadata
© 2012 9 / 36
Technical
• Why is it good ?
• Why is it irrelevant ?
• Where are the nasty surprises?
Jonathan Lewis Examining Exadata
© 2012 10 / 36
5
- 6. USPs
• Smart Scans / offload
• Storage Indexes
• Hybrid Columnar Compression
• Smart flash cache
– (vs. Database flash cache)
Jonathan Lewis Examining Exadata
© 2012 11 / 36
Earliest Experiences (a)
Jonathan Lewis Examining Exadata
© 2012 12 / 36
6
- 7. Earliest Experiences (b)
Query
Jonathan Lewis Examining Exadata
© 2012 13 / 36
Earliest Experiences (c)
select small_vc, sum(rep_col)
from t1
group by small_vc
having sum(rep_col) > 1000
SELECT A1.C0, SUM(A1.C1) FROM :Q613000 A1
GROUP BY A1.C0 HAVING SUM(A1.C1)>1000
SELECT /*+ NO_EXPAND ROWID(A1) */
A1."SMALL_VC" C0, A1."REP_COL" C1
FROM "T1" PX_GRANULE(0, BLOCK_RANGE, DYNAMIC) A1
WAIT #1: nam='direct path read' ela= 12 p1=11 p2=2582 p3=4
WAIT #1: nam='direct path read' ela= 14747 p1=11 p2=2722 p3=8
WAIT #1: nam='PX Deq Credit: send blkd' ela= 2 p1=268500994 p2=1 p3=0
WAIT #1: nam='direct path read' ela= 3 p1=11 p2=2730 p3=8
WAIT #1: nam='direct path read' ela= 295 p1=11 p2=2738 p3=8
Jonathan Lewis Examining Exadata
© 2012 14 / 36
7
- 8. Earliest Experiences (d)
• Highs
– Local discs
– Hard-wired "network"
– Result sets (row/column projections) sent across n/w
• Lows
– Every node was a full Oracle instance
– No detailed information about which node to call
– Instances had to access discs on remote nodes
• (implemented at the chip level)
Jonathan Lewis Examining Exadata
© 2012 15 / 36
Smart Scan / Offload
Query
Database instance(s)
Decomposition ASM instance
Block Range
Request
Cell
code
Jonathan Lewis Examining Exadata
© 2012 16 / 36
8
- 9. Flash Cache
Database Flash Cache
"Level 2 cache"
Cell / Smart
Flash Cache
Jonathan Lewis Examining Exadata
© 2012 17 / 36
USPs
• Smart Scans / offload
• Smart flash cache
– (vs. Database flash cache)
• Storage Indexes
• Hybrid Columnar Compression
Jonathan Lewis Examining Exadata
© 2012 18 / 36
9
- 10. Storage Indexes (a)
Up to 8 columns
Memory Only for "tables"
ColX Low High Nulls ColA Low High Nulls ColX Low High Nulls
ColY Low High Nulls ColZ Low High Nulls ColZ Low High Nulls
ColZ Low High Nulls ColA Low High Nulls
ColB Low High Nulls
...
ColF Low High Nulls
Disk
1MB 1MB 1MB
Jonathan Lewis Examining Exadata
© 2012 19 / 36
Storage Indexes (a)
"cell smart table scan" request: where colX = 99
ColX 10 100 Yes ColA Low High Nulls ColX 200 350 No
ColY Low High Nulls ColZ Low High Nulls ColZ Low High Nulls
ColZ Low High Nulls ColA Low High Nulls
ColX Low High Nulls
ColB Low High Nulls
...
ColF Low High Nulls
Will visit Must visit Will skip
Will load SI
1MB 1MB 1MB
Jonathan Lewis Examining Exadata
© 2012 20 / 36
10
- 11. Storage Indexes (b)
select … from t1 where col1 = 'k1'; -- Slow
select … from t1 where col1 = 'k2'; -- Quicker
select … from t1 where col2 = 'k1'; -- Slow
select … from t1 where col2 = 'k2'; -- Quicker
...
select … from t1 where col8 = 'k1'; -- Slow
select … from t1 where col8 = 'k2'; -- Quicker
select … from t1 where col9 = 'k1'; -- Slow
select … from t1 where col9 = 'k2'; -- ????
select … from t1 where col1 = 'k1'; -- ????
select … from t1 where col2 = 'k1'; -- ????
Jonathan Lewis Examining Exadata
© 2012 21 / 36
Storage Indexes (c)
select … from t1 where colA = 'k1' and colB = 'k2'; -- Slow
select … from t1 where colA = 'k1'; -- Quick
select … from t1 where colB = 'k2'; -- Quick
select … from t1 where colA = 'k1'; -- Slow
select … from t1 where colB = 'k2'; -- Slow
select … from t1 where colA = 'k1' and colB = 'k2'; -- Quick
select … from t1 where colA = 'k1'; -- Slow
select … from t1 where colA = 'k1' and colB = 'k2'; -- Quick
select … from t1 where colB = 'k2'; -- ????
Jonathan Lewis Examining Exadata
© 2012 22 / 36
11
- 12. Storage Indexes (d)
create table t1 as
select
mod(rownum,1e3) scattered, -- 1,000 rows per value
trunc((rownum-1)/1e3) clustered, -- 1,000 rows per value
...
from {very large rowsource}
where rownum <= 1e6
;
Scattered: 0, 1, 2, ..., 998, 999, 0, 1, 2, ..., 1, ... 998, 999
Every value appears in every MB
Clustered: 0, 0, 0, 0, ..., 1, 1, 1, 1, ... 999, 999, 999
Any given value appears in just one MB
Jonathan Lewis Examining Exadata
© 2012 23 / 36
USPs
• Smart Scans / offload
• Smart flash cache
– (vs. Database flash cache)
• Storage Indexes
• Hybrid Columnar Compression
Jonathan Lewis Examining Exadata
© 2012 24 / 36
12
- 13. Compression
• More rows per block
– fewer blocks to be read from disk
– More CPU to extract rows
– More CPU to load rows
– More contention on modificatin
• Compres for OLTP - "deduplication"
• Compress for query/archive - HCC
Jonathan Lewis Examining Exadata
© 2012 25 / 36
HCC (a)
CU Header
Column 1
Column 2
Column 3
Column 4
Bitmap
Maximum size for "archive" ca. 256KB (probably)
Limited to ca. 32KB for "query" (probably)
Up to 32,759 Rows (almost certainly) for both
Jonathan Lewis Examining Exadata
© 2012 26 / 36
13
- 14. HCC (b)
Block boundaries
Block header + Row directory
Jonathan Lewis Examining Exadata
© 2012 27 / 36
HCC (c)
CU Header
Column 2
Column 3
Column 3
Column 4
Column 1
Column 1
Column 3
Bitmap
Row Header
Row Directory
Block Header A CU is stored as a "single row"
The rows it holds are referenced individually by rowid
The row_number of a rowid is the row number within CU
The block address of a rowid is for the first block of the CU
Jonathan Lewis Examining Exadata
© 2012 28 / 36
14
- 15. HCC (d)
• Compression
– takes place at the database server
• Decompression
– Takes place at the db server of indexed access
– Takes place at the cell server of tablescans (usually)
– May have to take place at the db server for t/scans
– Can use a LOT of CPU.
Jonathan Lewis Examining Exadata
© 2012 29 / 36
HCC (e)
• Deletion
– We set one bit in the bitmap (and "lock" the CU)
• Updates
– We copy (migrate) the row to another block
• Which is stored as "compress for OLTP"
– We set the "deleted" bit (and "lock" the CU)
– We update every relevant index with the new rowid
– Smart scan disabled for this CU/MB
Jonathan Lewis Examining Exadata
© 2012 30 / 36
15
- 16. HCC - access by rowid (example)
• Load entire CU into db server cache.
• For each column used in query:
– "table fetch continued row" to start of column
– Decompress column into local memory
– Select column value
• How long can we keep the column ?
– Only for the equivalent of "buffer is pinned".
Jonathan Lewis Examining Exadata
© 2012 31 / 36
HCC - access by rowid (example)
select max(padding)
from t1_ah -- 33M(32 * 2^20) rows
where n_128k between 1000 and 1999 -- 256,000 rows
;
http://jonathanlewis.wordpress.com/2012/07/27/compression-units-3/
No compression 0.70 CPU seconds.
Query high 77.24 CPU seconds
Archive high 3,022.83 CPU seconds 16 seconds after manual optimization
CU 1 CU 2 CU 3 CU 4 CU 5
1 6 2 7 3 8 4 9 5 ...
Jonathan Lewis Examining Exadata
© 2012 32 / 36
16
- 17. Optimizer Problem (a)
SegHeader
1 Read The rows
we want
1 Read 1 Read 1 Read
What strategies can the optimizer adopt to acquire these four
rows when we run a query like:
select * from TABX where COLY = {constant}`
Jonathan Lewis Examining Exadata
© 2012 33 / 36
Optimizer Problem (b)
select * from TABX where COLY = {constant}
1 Read 1 Read 1 Read 1 Read
Jonathan Lewis Examining Exadata
© 2012 34 / 36
17
- 18. Optimizer Problem (c)
Traditional Hardware
Say 200 byte rows, 40 per block
Say we can read 16 blocks as fast as 1
The break point could be as high as 1 row in 640.
Exadata
Say 32 blocks = 32,000 rows - handled in one read
You could get 14 reads in-flight
The cell server CPU does the work
The break point could easily be 1 row in 450,000
(without allowing for storage index and cell flash cache effects)
(and then indexed access wipes out your db server CPU !)
Jonathan Lewis Examining Exadata
© 2012 35 / 36
Summary
• Exadata seems to offer little to OLTP
• Putting OLTP and its DSS/DW on the same box may be nice
• Political or economic arguments may apply
• Go and see Frits Hoogland !
• Storage indexes may create instability
• Choose compression levels carefully
• Indexing strategies are harder to decide
• And the optimizer is ignorant of Exadata featurea
• Go and see Richard Foote and Maria Colgan !
Jonathan Lewis Examining Exadata
© 2012 36 / 36
18