Weitere ähnliche Inhalte Ähnlich wie Tanel Poder - Performance stories from Exadata Migrations (20) Kürzlich hochgeladen (20) Tanel Poder - Performance stories from Exadata Migrations1. Performance
Stories
from
Exadata
Migra4ons
Tanel
Poder
h>p://blog.tanelpoder.com
h>p://tech.e2sn.com
tanel@tanelpoder.com
Enterprise
2.0
©
2009-‐2010
WWW.E2SN.COM
1
Service
Networks
www.e2sn.com
2. Intro:
About
me
• Tanel
Põder
• As
you
already
know
I’m
an
Oracle
database
geek
(for
last
14
years)
• Also
I
did
a
huge
career
jump
last
year
and
now
I’m
a
CEO
of
E2SN
Ltd.
• E2SN
=
Enterprise
2.0
Service
Networks
• Server
&
Database
ConsolidaXon,
VirtualizaXon,
Capacity
Management,
Performance!
• I’m
co-‐authoring
an
“Expert
Oracle
Exadata”
book
(with
Apress)
• WriXng
the
performance,
migraXons
and
consolidaXon
chapters…
Enterprise
2.0
©
2009-‐2010
WWW.E2SN.COM
2
Service
Networks
www.e2sn.com
3. About
this
session
• Exadata’s
architecture
is
awesome,
no
doubt!
• However
it’s
“just”
hardware
+
sodware
• It’s
also
new
• It’s
also
complex
(storage
cells,
RAC,
ASM,
PX,
compression…)
• Not
very
well
understood
(yet)
• I’m
offering
some
of
my
experiences
so
far…
Enterprise
2.0
©
2009-‐2010
WWW.E2SN.COM
3
Service
Networks
www.e2sn.com
4. Challenge
–
MigraXng
a
large
DW
• A
very
large
telco
DW
(DB
size
was
close
to
100
TB
then)
• Already
compressed
with
10g
direct
load
block
compression
• Raw
uncompressed
data
size
¼
Petabytes!
• Already
heavily
parXXoned
• No
indexes
used
in
source
DB
• Source
plakorm:
Oracle
10.2
–
SPARC
/
Solaris
• Target:
Exadata
v2
–
Intel
/
Linux
New
hybrid
columnar
compression
is
cri%cal,
otherwise
the
data
• Exadata
full
rack
with
15000
RPM
SAS
disks
wouldn’t
fit
into
a
• 100
TB
raw
disk
space
single
rack
at
all!
• Ader
mirroring
&
overhead
-‐
~30
TB
real
space
for
data
Enterprise
2.0
©
2009-‐2010
WWW.E2SN.COM
4
Service
Networks
www.e2sn.com
5. Challenge
–
MigraXng
a
large
DW
• Which
migraXon
approach
to
use
(low
downXme)?
• Some
parXXons
known
to
be
read
only
were
(obviously)
migrated
in
advance
• All
data
had
to
be
compressed
with
Exadata
Hybrid
Columnar
Compression
(EHCC)
immediately
• In
other
words,
all
the
data
must
be
sent
through
compression
module
during
load
• EHCC
works
only
on
Exadata,
can’t
pre-‐compress
data
in
the
source:
• ORA-‐64307:
hybrid
columnar
compression
is
only
supported
in
tablespaces
residing
on
Exadata
storage
• This
ruled
out
all
these
opXons:
The
DW
loads
were
• Physical
data
guard
done
NOLOGGING
anyway,
ruling
out
the
• Logical
data
guard
Streams/Golden
Gate
• ASM
mirror
snapshots
and
other
log-‐mining
• Transportable
tablespaces
based
approaches
Enterprise
2.0
©
2009-‐2010
WWW.E2SN.COM
5
Service
Networks
www.e2sn.com
6. Challenge
–
MigraXng
a
large
DW
• Only
two
reasonable
migraXon
paths
led
for
large
tables:
1. Data
Pump
• Extract
data
to
files
• Copy
files
(or
share
over
NFS)
• Load
files
into
EHCC
compressed
tables/parXXons
2. INSERT
/*+
APPEND
*/
over
Database
Links
• No
need
to
dump
data
to
files,
copy
&
read
again
• Less
overhead
• Less
complexity
managing
files
• We
chose
the
INSERT
over
DBLinks
approach
• But…
Enterprise
2.0
©
2009-‐2010
WWW.E2SN.COM
6
Service
Networks
www.e2sn.com
7. Why
is
the
data
load
slow?
(Snapper)
--------------------------------------------------------------------------------------------------------
SID, USERNAME , TYPE, STATISTIC , DELTA, HDELTA/SEC, %TIME
--------------------------------------------------------------------------------------------------------
1570, SYSTEM , STAT, bytes sent via SQL*Net to dblink , 536, 17.87,
1570, SYSTEM , STAT, bytes received via SQL*Net from dblink , 113985017, 3.8M,
1570, SYSTEM , STAT, SQL*Net roundtrips to/from dblink , 67, 2.23,
1570, SYSTEM , TIME, parse time elapsed , 842, 28.07us, .0%
1570, SYSTEM , TIME, DB CPU , 29739479, 991.32ms, 99.1%
1570, SYSTEM , TIME, sql execute elapsed time , 30031492, 1s, 100.1%
1570, SYSTEM , TIME, DB time , 30031492, 1s, 100.1%
1570, SYSTEM , WAIT, gc current multi block request , 3682, 122.73us, .0%
1570, SYSTEM , WAIT, gc cr block 3-way , 539, 17.97us, .0%
1570, SYSTEM , WAIT, gc current block 3-way , 459, 15.3us, .0%
1570, SYSTEM , WAIT, gc current grant 2-way , 209, 6.97us, .0%
1570, SYSTEM , WAIT, row cache lock , 2566, 85.53us, .0%
1570, SYSTEM , WAIT, SQL*Net message to dblink , 80, 2.67us, .0%
1570, SYSTEM , WAIT, SQL*Net message from dblink , 263979, 8.8ms, .9%
1570, SYSTEM , WAIT, SQL*Net more data from dblink , 113617, 3.79ms, .4%
1570, SYSTEM , WAIT, kfk: async disk IO , 7562, 252.07us, .0%
1570, SYSTEM , WAIT, events in waitclass Other , 5549, 184.97us, .0%
-- End of Stats snap 1, end=2010-06-28 12:09:51, seconds=30
----------------------------------------------------------------------- This
load
is
CPU
Active% | SQL_ID | EVENT | WAIT_CLASS bound
because
of
-----------------------------------------------------------------------
98% | 0zg4phrzs1s55 | ON CPU | ON CPU
compression.
Let’s
2% | 0zg4phrzs1s55 | SQL*Net message from dbli | Network use
parallel
execuTon
instead!
Enterprise
2.0
©
2009-‐2010
WWW.E2SN.COM
7
Service
Networks
www.e2sn.com
8. Why
is
the
data
load
slow?
(OstackProf)
SQL> @ostackprof 1151 0 100
-- oStackProf v1.00 - EXPERIMENTAL script by Tanel Poder ( http://www.tanelpoder.com )
Below is the stack prefix common to all samples:
------------------------------------------------------------------------
Frame->function()
------------------------------------------------------------------------
[…some output snipped…]
# 15 ->kdza_get_layout()
# 14 ->kdzains()
# 13 ->kdzanalyze()
# 12 ->kdza_best_col_trans()
# 11 ->kdza_analyze_col_trans()
# 10 ->kdza_compress_col()
# 9 ->kdzc_comp_col_analyzer()
# ...(see call profile below)
#
# -#--------------------------------------------------------------------
# - Num.Samples -> in call stack()
# ----------------------------------------------------------------------
20 ->kdzc_comp_buffer()->kgccdo()->kgccbzip2pseudodo()->kgccbzip2do()->BZ2_bzCompress()->
19 ->kdzc_comp_buffer()->kgccdo()->kgccbzip2pseudodo()->kgccbzip2do()->BZ2_bzCompress()->
12 ->kdzc_comp_buffer()->kgccdo()->kgccbzip2pseudodo()->kgccbzip2do()->BZ2_bzCompress()->
8 ->kdzc_comp_buffer()->kgccdo()->kgccbzip2pseudodo()->kgccbzip2do()->BZ2_bzCompress()->
7 ->kdzc_comp_buffer()->kgccdo()->kgccbzip2pseudodo()->kgccbzip2do()->BZ2_bzCompress()->
Enterprise
2.0
©
2009-‐2010
WWW.E2SN.COM
8
Service
Networks
www.e2sn.com
9. Exadata
Hybrid
Columnar
Compression
(EHCC)
• COMPRESS
FOR
DIRECT_LOAD
OPERATIONS
Old-‐fashioned
dicTonary
de-‐
• Single
block
column
data
is
deduplicated
during
load
duplicaTon
method
• CTAS
and
direct
path
loads
only
(since
9i).
Every
block
header
has
a
• COMPRESS
FOR
OLTP
colum
dicTonary
in
• Single
block
column
data
is
deduplicated
by
the
FG
block
header.
Only
process
once
it
gets
full
*
one
block
has
to
be
– *
I
had
it
wrong
in
the
iniTal
slides
demoed
at
UKOUG
read
for
single
row
access
• Hybrid
Columnar
Compression
• Mul4-‐MB
column-‐oriented
compression
units
are
zipped!
• COMPRESS
FOR
QUERY
LOW
-‐
LZO
-‐
fastest
• COMPRESS
FOR
QUERY
HIGH
-‐
ZLIB
(like
gzip,
lower
level)
• COMPRESS
FOR
ARCHIVE
LOW
-‐
ZLIB
(higher
level)
• COMPRESS
FOR
ARCHIVE
HIGH
-‐
BZIP2
(slowest,
best
compression)
Enterprise
2.0
©
2009-‐2010
WWW.E2SN.COM
9
Service
Networks
www.e2sn.com
10. All
cells
almost
idle
when
doing
a
large
data
load
???!!!
Enterprise
2.0
©
2009-‐2010
WWW.E2SN.COM
10
Service
Networks
www.e2sn.com
11. Compression
and
large
data
loads
(migraXons)
• De-‐compression
can
be
done
either
at
the
cell
or
database
layer
• Thanks
to
column
orientaXon
of
EHCC
compression
units,
the
smart
scan
can
only
read
&
uncompress
the
required
columns
and
avoid
IO
even
further
• Compression
is
done
only
at
the
database
layer
• One
of
the
reasons
is
probably
redo
logging
–
if
compression
happened
at
the
cell
level,
how
to
send
back
compressed
block
images
for
redo
logging?
• You’ll
need
to
be
aware
that
in
real-‐life
migraXons
into
HCC
compressed
tables
the
5
TB/hr
data
loads
are
not
achievable
• Be>er
test,
before
esXmaXng
migraXon
downXme!
Enterprise
2.0
©
2009-‐2010
WWW.E2SN.COM
11
Service
Networks
www.e2sn.com
12. DB
Link
data
transfer
• A
10
Gbit
Ethernet
adapter
was
installed
into
source
server
• And
a
10
GbE
<-‐>
Infiniband
switch
• ~1
GB
per
second
transfer
capacity?
• INSERT
/*+
APPEND
*/
SELECT
• Single
session
• This
gave
only
~3-‐5
MB
data
transfer
rate
due
to
compression
CPU
bo>leneck
• INSERT
/*+
APPEND
PARALLEL
…
*/
SELECT
…
• Up
to
256
threads
(automaXc
parallelism)
• SXll
only
60-‐70MB/sec
network
transfer
rate
over
10
GbE
link
???
Enterprise
2.0
©
2009-‐2010
WWW.E2SN.COM
12
Service
Networks
www.e2sn.com
13. Parallel
Direct
path
insert
• INSERT
/*+
APPEND
PARALLEL
(x)
*/
• Now
we
used
up
to
64
CPU
cores
and
the
CPU
bo>leneck
was
shided
• But
this
gave
only
60-‐70MB/sec
network
throughput!
PX
PX
PX
PX
QC
QC
Source
DB
Exadata
PX
PX
PX
PX
Regardless
of
parallelism,
the
DB
link’s
Theore4cal
max
throughput
per
connec4on:
SQL*Net
traffic
flows
between
Query
Exadata
<-‐>
SPARC
server
network
RTT
=
0.5
ms
Coordinators
only!
32
kB
SDU
/
0.5
ms
RTT
=
64
MB
/
sec
Enterprise
2.0
©
2009-‐2010
WWW.E2SN.COM
13
Service
Networks
www.e2sn.com
14. Challenge
–
MigraXng
a
large
DW
• SoluXon:
1. Run
many
serial
INSERT
APPENDs
instead
of
one
parallel
2. Separate
insert
for
each
sub-‐parXXon
• INSERT
/*+
APPEND
*/
INTO
tableX
SUBARTITION
(p)
SELECT…
• When
inserXng
into
a
parXXon,
then
only
the
parXXon
(not
enXre
table)
is
locked
3. Create
local
views
with
SELECT
…
SUBPARTITION(p)
syntax
• So
that
the
full
source
table
wouldn’t
be
scanned
for
each
parXXon
load
• The
PARTITION(p)
keyword
is
not
sent
to
source
DB
over
dblink!
• For
range/list
parXXoned
tables
just
use
WHERE
clause
on
parXXon
key
– Can’t
use
this
syntax
for
selecXng
out
individual
hash
parXXons
4. MulXple
jobs
were
scheduled
which
transferred
parXXons
over
dblinks
• A
control
table
kept
track
of
which
parXXons
were
transferred
Enterprise
2.0
©
2009-‐2010
WWW.E2SN.COM
14
Service
Networks
www.e2sn.com
15. Default
parallel
degree
allows
to
easily
use
up
whole
cluster’s
resources.
Enterprise
2.0
©
2009-‐2010
WWW.E2SN.COM
15
Service
Networks
www.e2sn.com
16. Checking
the
real
parallelism
-‐
1
• Run
a
query
with
mulXple
UNION
ALLs:
SQL> show parameter parallel_degree
NAME TYPE VALUE
------------------------------------ ---------- -----------------------------------
parallel_degree_limit string CPU
parallel_degree_policy string MANUAL
SQL> alter session force parallel query parallel 2;
Session altered.
SQL> select count(*) from t union all
2 select count(*) from t union all
3 select count(*) from t union all
4 select count(*) from t;
COUNT(*)
----------
27538614
27538614
27538614
27538614
Enterprise
2.0
©
2009-‐2010
WWW.E2SN.COM
16
Service
Networks
www.e2sn.com
17. Checking
the
real
parallelism
-‐
2
• Check
V$PQ_TQSTAT
ader
the
query
completes:
SQL> @tq
Show PX Table Queue statistics from last Parallel Execution in this session...
TQ_ID TQ_FLOW DFO
(DFO,SET) DIRECTION NUM_ROWS BYTES WAITS TIMEOUTS PROCESS NUMBER TQ_ID
---------- --------- ---------- ---------- ---------- ---------- ------- ------ ------
:TQ10000 Produced 1 36 22 0 P000 1 0
Produced 1 36 9 0 P001
Consumed 2 72 34 1 QC
:TQ20000 Produced 1 36 23 0 P002 2 0
Produced 1 36 8 0 P003
Consumed 2 72 34 1 QC
:TQ30000 Produced 1 36 9 0 P005 3 0
Produced 1 36 21 0 P004
Consumed 2 72 34 1 QC
:TQ40000 Produced 1 36 8 0 P007 4 0
Produced 1 36 23 0 P006
Consumed 2 72 34 1 QC
Enterprise
2.0
©
2009-‐2010
WWW.E2SN.COM
17
Service
Networks
www.e2sn.com
18. Query
Coordinator
PX
Slave
P005
PX
Slave
P006
PX
Slave
P007
PX
Slave
P008
Table
Queue
(TQ)
PX
Slave
P001
PX
Slave
P002
PX
Slave
P003
PX
Slave
P004
PX
execuXon
flow
Enterprise
2.0
©
2009-‐2010
WWW.E2SN.COM
18
Service
Networks
www.e2sn.com
19. Producer-‐consumer
hierarchy
QC
QC
Table
Queue:
TQ1,
02
First
stage
PX
Slave
P001
PX
Slave
P002
Table
Queue:
TQ1,
01
Table
Queue:
TQ1,
01
PX
Slave
P003
PX
Slave
P004
PX
Slave
P003
PX
Slave
P004
Table
Queue:
TQ1,
00
Second
stage
PX
Slave
P001
PX
Slave
P002
Enterprise
2.0
©
2009-‐2010
WWW.E2SN.COM
19
Service
Networks
www.e2sn.com
20. Do
you
need
indexes
on
Exadata
or
not?
• It
all
depends
on
whether
your
workload
&
physical
design
is
right
for
smart
full
scanning:
1. How
oden
and
how
selecXvely
you
access
data?
• OLTP
databases
do
need
indexes!
• Also
many
DWs
which
are
used
as
reference
data
lookup
warehouse,
not
just
for
bulk
reporXng
2. What
other
IO
eliminaXon
methods
you
have
in
place?
• In
DW
&
reporXng
your
database
absolutely
needs
to
be
parXXoned
• And
the
code
has
to
be
designed
for
parXXon
pruning
• Storage
indexes
will
help
too
(unless
you
hit
a
bug),
but
parXXon
pruning
is
absolutely
needed
Enterprise
2.0
©
2009-‐2010
WWW.E2SN.COM
20
Service
Networks
www.e2sn.com
21. Exadata
Performance
An4paVerns
-‐
DW
• You
don’t
use
direct
path
full
table
scans
for
DW
&
reporXng
• Direct
path
reads
are
needed
for
smart
scan
to
kick
in
• Hash
join
bloom
filter
pushdown
to
storage
kicks
in
with
with
smart
scan
only
• You
sXll
rely
on
bitmap
indexes
because
of
past
• Bitmap
index-‐based
access
ends
up
with
“random”
single
block
reads!
• Smart
scan
isn’t
used
with
single
block
reads
• NB!
If
your
schema
doesn’t
allow
good
parTTon
pruning,
then
bitmap
index
access
may
sTll
be
the
next
best
opTon
for
DW
• It
also
depends
on
the
nature
of
your
query
and
selecXviXes
of
predicates
• You
don’t
limit
the
real
PX
parallelism
• Se€ng
parallel
degree
alone
does
not
limit
the
real
parallelism
• UnXl
11.2.0.2
the
Resource
Manager
is
only
way
to
limit
the
real
paralellism
• parallel_degree_policy
=
auto
is
the
future
(seems
to
work
be>er
in
11.2.0.2)
Enterprise
2.0
©
2009-‐2010
WWW.E2SN.COM
21
Service
Networks
www.e2sn.com
22. Exadata
Performance
An4paVerns
-‐
OLTP
• Every
(valid)
RAC
performance
anXpa>ern
• As
Exadata
is
sXll
a
RAC
cluster
• 8
nodes
or
2
nodes
(X2-‐8)
per
full
rack
• gc
buffer
busy
wait
contenXon!
• You
should
be>er
do
some
workload
management
to
avoid
write-‐write
contenXon
across
many
RAC
nodes
• You
compress
frequently
used
data
with
QUERY/ARCHIVE
• Every
single
block/row
read
has
to
uncompress
the
whole
(zipped)
compression
unit
–
big
overhead
when
fetching
many
single
rows
• You
don’t
use
flash
cache
• Without
flash
cache,
Exadata
for
OLTP
workloads
is
no
different
from
a
bunch
of
Linux
pizzaboxes
packed
into
a
rack
Enterprise
2.0
©
2009-‐2010
WWW.E2SN.COM
22
Service
Networks
www.e2sn.com
23. Performance
bugs
and
slow
smart
scan
processing?
• How
to
troubleshoot
slow
smart
scan
processing?
• Make
sure
direct
path
IO
is
used
• Smart
scan
is
usable
only
with
(serial
or
parallel)
direct
path
scans
• How
to
check
what’s
happening
yourself?
• Use
a
systemaXc
approach
explained
in
my
arXcle:
• h>p://tech.e2sn.com/oracle/exadata/performance-‐troubleshooXng/
exadata-‐smart-‐scan-‐performance
• Or
just
google
for
“troubleshooXng
exadata
performance”
Enterprise
2.0
©
2009-‐2010
WWW.E2SN.COM
23
Service
Networks
www.e2sn.com
24. Things
affecXng
direct
path
read
decisions
• Serial
execuXon
can
switch
to
direct
path
reads
• Dynamic
adapXve
direct
reads
• _small_table_threshold
• _very_large_table_threshold
• Segment’s
cached
block
count
(
X$KCBOQH.NUM_BUF
)
• Parallel
ExecuXon
can
switch
to
buffered
reads
in
11.2
• When
parallel_degree_policy
=
AUTO
• Buffered
reads
=
no
smart
scan
Enterprise
2.0
©
2009-‐2010
WWW.E2SN.COM
24
Service
Networks
www.e2sn.com
25. The
kind
of
queries
which
won’t
benefit
from
Exadata
• SELECT
projecXon
lists
can’t
be
offloaded
SELECT
col1, col2, Such
large
CASE
statements
CASE WHEN (col3, col4) IN ( aren’t
somehow
offloaded
to
( 123, 234 ), storage
and
aren’t
opXmized
( 124, 456 ), by
Exadata
( 130, 789 ),
… Also,
exadata
can’t
do
…this goes on hundreds of times… anything
about
expensive
PL/
… SQL
funcXons
in
SELECT
list
or
( 140, 888 ) WHERE
clause
)
FROM
Exadata
is
designed
to
find
t1
matching
data
efficiently
, t2
, tX
WHERE SELECT
*
FROM
t1.colX = … v$sqlfn_metadata
AND t1.id = t2.id …
WHERE
offloadable
=
'YES';
Enterprise
2.0
©
2009-‐2010
WWW.E2SN.COM
25
Service
Networks
www.e2sn.com
26. System
staXsXcs
on
Exadata?
• DW
&
smart
scan-‐oriented
databases
• DW
&
smart
scans
-‐
use
the
default
noworkload
stats
• MREADTIM
/
SREADTIM
info
not
gathered
anyway
when
the
workload
runs
smart
scans
only
• OLTP
&
single
block
read-‐oriented
databases
• Slow
spinning
disk
IO
vs
flash
cache
IO
• If
running
enXre
DB
on
flash
–
gather
workload
stats
• If
running
on
mixed
storage
–
I’d
leave
noworkload
stats
(or
set
stats
which
show
regular
disk
IO
response
Xmes
–
5ms+)
Enterprise
2.0
©
2009-‐2010
WWW.E2SN.COM
26
Service
Networks
www.e2sn.com
27. Exadata
works
well!
• Unless
you
hit
bugs…
• …And
your
schema
design
allows
parXXon
pruning
• …And
you
actually
use
direct
path
segment
scans
• …And
you
have
good
opXmizer
staXsXcs
• Especially
if
you
do
use
indexes,
so
that
opXmizer
wouldn’t
end
up
choosing
an
index
instead
of
smart
scan
(underesXmated
row-‐counts)
• …And
you
don’t
carry
over
old
way
of
thinking
• Remove
all
old
hints,
rethink
bitmap
indexes,
ensure
parXXon
pruning
Enterprise
2.0
©
2009-‐2010
WWW.E2SN.COM
27
Service
Networks
www.e2sn.com
28. QuesXons?!
Thanks!!!
tanel@tanelpoder.com
^^^^
Further
quesXons,
exadata
consolidaXon,
migraXon,
troubleshooXng,
performance
&
capacity
planning
Enterprise
2.0
©
2009-‐2010
WWW.E2SN.COM
28
Service
Networks
www.e2sn.com