Tanel Poder - Performance stories from Exadata Migrations

Performance
Stories
from
Exadata
Migra4ons

Tanel
Poder

h>p://blog.tanelpoder.com

h>p://tech.e2sn.com

tanel@tanelpoder.com

Enterprise
2.0
©
2009-‐2010

WWW.E2SN.COM

1

Service
Networks

www.e2sn.com

Intro:
About
me

•  Tanel
Põder

•  As
you
already
know
I’m
an
Oracle
database
geek
(for
last
14
years)

•  Also
I
did
a
huge
career
jump
last
year
and
now
I’m
a
CEO
of
E2SN
Ltd.

•  E2SN
=
Enterprise
2.0
Service
Networks

•  Server
&
Database
ConsolidaXon,
VirtualizaXon,
Capacity
Management,

Performance!

•  I’m
co-‐authoring
an
“Expert
Oracle
Exadata”
book
(with
Apress)

•  WriXng
the
performance,
migraXons
and
consolidaXon
chapters…

Enterprise
2.0
©
2009-‐2010

WWW.E2SN.COM

2

Service
Networks

www.e2sn.com

About
this
session

•  Exadata’s
architecture
is
awesome,
no
doubt!

•  However
it’s
“just”
hardware
+
sodware

•  It’s
also
new

•  It’s
also
complex
(storage
cells,
RAC,
ASM,
PX,
compression…)

•  Not
very
well
understood
(yet)

•  I’m
oﬀering
some
of
my
experiences
so
far…

Enterprise
2.0
©
2009-‐2010

WWW.E2SN.COM

3

Service
Networks

www.e2sn.com

Challenge
–
MigraXng
a
large
DW

•  A
very
large
telco
DW
(DB
size
was
close
to
100
TB
then)

•  Already
compressed
with
10g
direct
load
block
compression

•  Raw
uncompressed
data
size
¼
Petabytes!


•  Already
heavily
parXXoned

•  No
indexes
used
in
source
DB

•  Source
plakorm:
Oracle
10.2
–
SPARC
/
Solaris

•  Target:
Exadata
v2
–
Intel
/
Linux

New
hybrid
columnar

compression
is
cri%cal,

otherwise
the
data

•  Exadata
full
rack
with
15000
RPM
SAS
disks
wouldn’t
ﬁt
into
a

•  100
TB
raw
disk
space
single
rack
at
all!

•  Ader
mirroring
&
overhead
-‐
~30
TB
real
space
for
data

Enterprise
2.0
©
2009-‐2010

WWW.E2SN.COM

4

Service
Networks

www.e2sn.com

Challenge
–
MigraXng
a
large
DW

•  Which
migraXon
approach
to
use
(low
downXme)?

•  Some
parXXons
known
to
be
read
only
were
(obviously)
migrated
in

advance

•  All
data
had
to
be
compressed
with
Exadata
Hybrid
Columnar

Compression
(EHCC)
immediately

•  In
other
words,
all
the
data
must
be
sent
through
compression
module

during
load

•  EHCC
works
only
on
Exadata,
can’t
pre-‐compress
data
in
the
source:

•  ORA-‐64307:
hybrid
columnar
compression
is
only
supported
in
tablespaces

residing
on
Exadata
storage

•  This
ruled
out
all
these
opXons:
The
DW
loads
were

•  Physical
data
guard
done
NOLOGGING

anyway,
ruling
out
the

•  Logical
data
guard

Streams/Golden
Gate

•  ASM
mirror
snapshots
and
other
log-‐mining

•  Transportable
tablespaces
based
approaches

Enterprise
2.0
©
2009-‐2010

WWW.E2SN.COM

5

Service
Networks

www.e2sn.com

Challenge
–
MigraXng
a
large
DW

•  Only
two
reasonable
migraXon
paths
led
for
large
tables:

1.  Data
Pump

•  Extract
data
to
files

•  Copy
files
(or
share
over
NFS)

•  Load
files
into
EHCC
compressed
tables/parXXons

2.  INSERT
/*+
APPEND
*/
over
Database
Links

•  No
need
to
dump
data
to
files,
copy
&
read
again

•  Less
overhead

•  Less
complexity
managing
files

•  We
chose
the
INSERT
over
DBLinks
approach

•  But…

Enterprise
2.0
©
2009-‐2010

WWW.E2SN.COM

6

Service
Networks

www.e2sn.com

Why
is
the
data
load
slow?
(Snapper)

--------------------------------------------------------------------------------------------------------
SID, USERNAME , TYPE, STATISTIC , DELTA, HDELTA/SEC, %TIME
--------------------------------------------------------------------------------------------------------
1570, SYSTEM , STAT, bytes sent via SQL*Net to dblink , 536, 17.87,
1570, SYSTEM , STAT, bytes received via SQL*Net from dblink , 113985017, 3.8M,
1570, SYSTEM , STAT, SQL*Net roundtrips to/from dblink , 67, 2.23,
1570, SYSTEM , TIME, parse time elapsed , 842, 28.07us, .0%
1570, SYSTEM , TIME, DB CPU , 29739479, 991.32ms, 99.1%
1570, SYSTEM , TIME, sql execute elapsed time , 30031492, 1s, 100.1%
1570, SYSTEM , TIME, DB time , 30031492, 1s, 100.1%
1570, SYSTEM , WAIT, gc current multi block request , 3682, 122.73us, .0%
1570, SYSTEM , WAIT, gc cr block 3-way , 539, 17.97us, .0%
1570, SYSTEM , WAIT, gc current block 3-way , 459, 15.3us, .0%
1570, SYSTEM , WAIT, gc current grant 2-way , 209, 6.97us, .0%
1570, SYSTEM , WAIT, row cache lock , 2566, 85.53us, .0%
1570, SYSTEM , WAIT, SQL*Net message to dblink , 80, 2.67us, .0%
1570, SYSTEM , WAIT, SQL*Net message from dblink , 263979, 8.8ms, .9%
1570, SYSTEM , WAIT, SQL*Net more data from dblink , 113617, 3.79ms, .4%
1570, SYSTEM , WAIT, kfk: async disk IO , 7562, 252.07us, .0%
1570, SYSTEM , WAIT, events in waitclass Other , 5549, 184.97us, .0%
-- End of Stats snap 1, end=2010-06-28 12:09:51, seconds=30

----------------------------------------------------------------------- This
load
is
CPU

Active% | SQL_ID | EVENT | WAIT_CLASS bound
because
of

-----------------------------------------------------------------------
98% | 0zg4phrzs1s55 | ON CPU | ON CPU
compression.
Let’s

2% | 0zg4phrzs1s55 | SQL*Net message from dbli | Network use
parallel

execuTon
instead!

Enterprise
2.0
©
2009-‐2010

WWW.E2SN.COM

7

Service
Networks

www.e2sn.com

Why
is
the
data
load
slow?
(OstackProf)

SQL> @ostackprof 1151 0 100
-- oStackProf v1.00 - EXPERIMENTAL script by Tanel Poder ( http://www.tanelpoder.com )

Below is the stack prefix common to all samples:
------------------------------------------------------------------------
Frame->function()
------------------------------------------------------------------------
[…some output snipped…]
# 15 ->kdza_get_layout()
# 14 ->kdzains()
# 13 ->kdzanalyze()
# 12 ->kdza_best_col_trans()
# 11 ->kdza_analyze_col_trans()
# 10 ->kdza_compress_col()
# 9 ->kdzc_comp_col_analyzer()
# ...(see call profile below)
#
# -#--------------------------------------------------------------------
# - Num.Samples -> in call stack()
# ----------------------------------------------------------------------
20 ->kdzc_comp_buffer()->kgccdo()->kgccbzip2pseudodo()->kgccbzip2do()->BZ2_bzCompress()->

Enterprise
2.0
©
2009-‐2010

WWW.E2SN.COM

8

Service
Networks

www.e2sn.com

Exadata
Hybrid
Columnar
Compression
(EHCC)

•  COMPRESS
FOR
DIRECT_LOAD
OPERATIONS
Old-‐fashioned

dicTonary
de-‐
•  Single
block
column
data
is
deduplicated
during
load
duplicaTon
method

•  CTAS
and
direct
path
loads
only
(since
9i).
Every

block
header
has
a

•  COMPRESS
FOR
OLTP
colum
dicTonary
in

•  Single
block
column
data
is
deduplicated
by
the
FG
block
header.
Only

process
once
it
gets
full
*
one
block
has
to
be

–  *
I
had
it
wrong
in
the
iniTal
slides
demoed
at
UKOUG
read
for
single
row

access

•  Hybrid
Columnar
Compression

•  Mul4-‐MB
column-‐oriented
compression
units
are
zipped!

•  COMPRESS
FOR
QUERY
LOW

-‐
LZO
-‐
fastest

•  COMPRESS
FOR
QUERY
HIGH

-‐
ZLIB
(like
gzip,
lower
level)

•  COMPRESS
FOR
ARCHIVE
LOW

-‐
ZLIB
(higher
level)

•  COMPRESS
FOR
ARCHIVE
HIGH
-‐
BZIP2
(slowest,
best
compression)

Enterprise
2.0
©
2009-‐2010

WWW.E2SN.COM

9

Service
Networks

www.e2sn.com

All
cells
almost
idle

when
doing
a
large

data
load
???!!!

Enterprise
2.0
©
2009-‐2010

WWW.E2SN.COM

10

Service
Networks

www.e2sn.com

Compression
and
large
data
loads
(migraXons)

•  De-‐compression
can
be
done
either
at
the
cell
or
database

layer

•  Thanks
to
column
orientaXon
of
EHCC
compression
units,
the
smart

scan
can
only
read
&
uncompress
the
required
columns
and
avoid
IO

even
further

•  Compression
is
done
only
at
the
database
layer

•  One
of
the
reasons
is
probably
redo
logging
–
if
compression

happened
at
the
cell
level,
how
to
send
back
compressed
block
images

for
redo
logging?

•  You’ll
need
to
be
aware
that
in
real-‐life
migraXons
into
HCC

compressed
tables
the
5
TB/hr
data
loads
are
not
achievable

•  Be>er
test,
before
esXmaXng
migraXon
downXme!

Enterprise
2.0
©
2009-‐2010

WWW.E2SN.COM

11

Service
Networks

www.e2sn.com

DB
Link
data
transfer

•  A
10
Gbit
Ethernet
adapter
was
installed
into
source
server

•  And
a
10
GbE
<-‐>
Inﬁniband
switch

•  ~1
GB
per
second
transfer
capacity?

•  INSERT
/*+
APPEND
*/
SELECT

•  Single
session

•  This
gave
only
~3-‐5
MB
data
transfer
rate
due
to
compression
CPU

bo>leneck

•  INSERT
/*+
APPEND
PARALLEL
…
*/
SELECT
…

•  Up
to
256
threads
(automaXc
parallelism)

•  SXll
only
60-‐70MB/sec
network
transfer
rate
over
10
GbE
link
???

Enterprise
2.0
©
2009-‐2010

WWW.E2SN.COM

12

Service
Networks

www.e2sn.com

Parallel
Direct
path
insert

•  INSERT
/*+
APPEND
PARALLEL
(x)
*/

•  Now
we
used
up
to
64
CPU
cores
and
the
CPU
bo>leneck
was
shided

•  But
this
gave
only
60-‐70MB/sec
network
throughput!

PX
PX

PX
PX

QC
QC

Source
DB
Exadata

PX
PX

PX
PX

Regardless
of
parallelism,
the
DB
link’s
Theore4cal
max
throughput
per
connec4on:

SQL*Net
traﬃc
ﬂows
between
Query
Exadata
<-‐>
SPARC
server
network
RTT
=
0.5
ms

Coordinators
only!
32
kB
SDU
/
0.5
ms
RTT
=
64
MB
/
sec

Enterprise
2.0
©
2009-‐2010

WWW.E2SN.COM

13

Service
Networks

www.e2sn.com

Challenge
–
MigraXng
a
large
DW

•  SoluXon:

1.  Run
many
serial
INSERT
APPENDs
instead
of
one
parallel

2.  Separate
insert
for
each
sub-‐parXXon

•  INSERT
/*+
APPEND
*/
INTO
tableX
SUBARTITION
(p)
SELECT…

•  When
inserXng
into
a
parXXon,
then
only
the
parXXon
(not
enXre
table)

is
locked

3.  Create
local
views
with
SELECT
…
SUBPARTITION(p)
syntax

•  So
that
the
full
source
table
wouldn’t
be
scanned
for
each
parXXon
load

•  The
PARTITION(p)
keyword
is
not
sent
to
source
DB
over
dblink!

•  For
range/list
parXXoned
tables
just
use
WHERE
clause
on
parXXon
key

–  Can’t
use
this
syntax
for
selecXng
out
individual
hash
parXXons

4.  MulXple
jobs
were
scheduled
which
transferred
parXXons
over

dblinks

•  A
control
table
kept
track
of
which
parXXons
were
transferred

Enterprise
2.0
©
2009-‐2010

WWW.E2SN.COM

14

Service
Networks

www.e2sn.com

Default
parallel
degree

allows
to
easily
use
up
whole

cluster’s
resources.

Enterprise
2.0
©
2009-‐2010

WWW.E2SN.COM

15

Service
Networks

www.e2sn.com

Checking
the
real
parallelism
-‐
1

•  Run
a
query
with
mulXple
UNION
ALLs:

SQL> show parameter parallel_degree
NAME TYPE VALUE
------------------------------------ ---------- -----------------------------------
parallel_degree_limit string CPU
parallel_degree_policy string MANUAL

SQL> alter session force parallel query parallel 2;

Session altered.

SQL> select count(*) from t union all
2 select count(*) from t union all
3 select count(*) from t union all
4 select count(*) from t;

COUNT(*)
----------
27538614
27538614
27538614
27538614

Enterprise
2.0
©
2009-‐2010

WWW.E2SN.COM

16

Service
Networks

www.e2sn.com

Checking
the
real
parallelism
-‐
2

•  Check
V$PQ_TQSTAT
ader
the
query
completes:

SQL> @tq
Show PX Table Queue statistics from last Parallel Execution in this session...

TQ_ID TQ_FLOW DFO
(DFO,SET) DIRECTION NUM_ROWS BYTES WAITS TIMEOUTS PROCESS NUMBER TQ_ID
---------- --------- ---------- ---------- ---------- ---------- ------- ------ ------
:TQ10000 Produced 1 36 22 0 P000 1 0
Produced 1 36 9 0 P001
Consumed 2 72 34 1 QC

:TQ20000 Produced 1 36 23 0 P002 2 0
Produced 1 36 8 0 P003

:TQ30000 Produced 1 36 9 0 P005 3 0
Produced 1 36 21 0 P004

:TQ40000 Produced 1 36 8 0 P007 4 0
Produced 1 36 23 0 P006

Enterprise
2.0
©
2009-‐2010

WWW.E2SN.COM

17

Service
Networks

www.e2sn.com

Query
Coordinator

PX
Slave
P005
PX
Slave
P006
PX
Slave
P007
PX
Slave
P008

Table
Queue
(TQ)

PX
Slave
P001
PX
Slave
P002
PX
Slave
P003
PX
Slave
P004

PX
execuXon
ﬂow

Enterprise
2.0
©
2009-‐2010

WWW.E2SN.COM

18

Service
Networks

www.e2sn.com

Producer-‐consumer
hierarchy

QC
QC

Table
Queue:
TQ1,
02

First
stage

PX
Slave
P001
PX
Slave
P002

Table
Queue:
TQ1,
01
Table
Queue:
TQ1,
01

PX
Slave
P003
PX
Slave
P004
PX
Slave
P003
PX
Slave
P004

Table
Queue:
TQ1,
00

Second
stage

PX
Slave
P001
PX
Slave
P002

Enterprise
2.0
©
2009-‐2010

WWW.E2SN.COM

19

Service
Networks

www.e2sn.com

Do
you
need
indexes
on
Exadata
or
not?

•  It
all
depends
on
whether
your
workload
&
physical
design
is

right
for
smart
full
scanning:

1.  How
oden
and
how
selecXvely
you
access
data?

•  OLTP
databases
do
need
indexes!

•  Also
many
DWs
which
are
used
as
reference
data
lookup
warehouse,
not

just
for
bulk
reporXng

2.  What
other
IO
eliminaXon
methods
you
have
in
place?

•  In
DW
&
reporXng
your
database
absolutely
needs
to
be
parXXoned

•  And
the
code
has
to
be
designed
for
parXXon
pruning

•  Storage
indexes
will
help
too
(unless
you
hit
a
bug),
but
parXXon
pruning

is
absolutely
needed

Enterprise
2.0
©
2009-‐2010

WWW.E2SN.COM

20

Service
Networks

www.e2sn.com

Exadata
Performance
An4paVerns
-‐
DW

•  You
don’t
use
direct
path
full
table
scans
for
DW
&
reporXng

•  Direct
path
reads
are
needed
for
smart
scan
to
kick
in

•  Hash
join
bloom
ﬁlter
pushdown
to
storage
kicks
in
with
with
smart
scan

only

•  You
sXll
rely
on
bitmap
indexes
because
of
past

•  Bitmap
index-‐based
access
ends
up
with
“random”
single
block
reads!

•  Smart
scan
isn’t
used
with
single
block
reads

•  NB!
If
your
schema
doesn’t
allow
good
parTTon
pruning,
then
bitmap

index
access
may
sTll
be
the
next
best
opTon
for
DW

•  It
also
depends
on
the
nature
of
your
query
and
selecXviXes
of
predicates

•  You
don’t
limit
the
real
PX
parallelism

•  Se€ng
parallel
degree
alone
does
not
limit
the
real
parallelism

•  UnXl
11.2.0.2
the
Resource
Manager
is
only
way
to
limit
the
real

paralellism

•  parallel_degree_policy
=
auto
is
the
future
(seems
to
work
be>er
in

11.2.0.2)

Enterprise
2.0
©
2009-‐2010

WWW.E2SN.COM

21

Service
Networks

www.e2sn.com

Exadata
Performance
An4paVerns
-‐
OLTP

•  Every
(valid)
RAC
performance
anXpa>ern

•  As
Exadata
is
sXll
a
RAC
cluster

•  8
nodes
or
2
nodes
(X2-‐8)
per
full
rack

•  gc
buffer
busy
wait
contenXon!

•  You
should
be>er
do
some
workload
management
to
avoid
write-‐write

contenXon
across
many
RAC
nodes

•  You
compress
frequently
used
data
with
QUERY/ARCHIVE

•  Every
single
block/row
read
has
to
uncompress
the
whole
(zipped)

compression
unit
–
big
overhead
when
fetching
many
single
rows

•  You
don’t
use
flash
cache

•  Without
flash
cache,
Exadata
for
OLTP
workloads
is
no
different
from
a

bunch
of
Linux
pizzaboxes
packed
into
a
rack

Enterprise
2.0
©
2009-‐2010

WWW.E2SN.COM

22

Service
Networks

www.e2sn.com

Performance
bugs
and
slow
smart
scan
processing?

•  How
to
troubleshoot
slow
smart
scan
processing?

•  Make
sure
direct
path
IO
is
used

•  Smart
scan
is
usable
only
with
(serial
or
parallel)
direct
path
scans

•  How
to
check
what’s
happening
yourself?

•  Use
a
systemaXc
approach
explained
in
my
arXcle:

•  h>p://tech.e2sn.com/oracle/exadata/performance-‐troubleshooXng/
exadata-‐smart-‐scan-‐performance

•  Or
just
google
for
“troubleshooXng
exadata
performance”

Enterprise
2.0
©
2009-‐2010

WWW.E2SN.COM

23

Service
Networks

www.e2sn.com

Things
affecXng
direct
path
read
decisions

•  Serial
execuXon
can
switch
to
direct
path
reads

•  Dynamic
adapXve
direct
reads

•  _small_table_threshold

•  _very_large_table_threshold

•  Segment’s
cached
block
count
(
X$KCBOQH.NUM_BUF
)

•  Parallel
ExecuXon
can
switch
to
buffered
reads
in
11.2

•  When
parallel_degree_policy
=
AUTO

•  Buffered
reads
=
no
smart
scan

Enterprise
2.0
©
2009-‐2010

WWW.E2SN.COM

24

Service
Networks

www.e2sn.com

The
kind
of
queries
which
won’t
benefit
from
Exadata

•  SELECT
projecXon
lists
can’t
be
offloaded

SELECT
col1, col2, Such
large
CASE
statements

CASE WHEN (col3, col4) IN ( aren’t
somehow
offloaded
to

( 123, 234 ), storage
and
aren’t
opXmized

( 124, 456 ), by
Exadata

( 130, 789 ),
… Also,
exadata
can’t
do

…this goes on hundreds of times… anything
about
expensive
PL/
… SQL
funcXons
in
SELECT
list
or

( 140, 888 ) WHERE
clause

)
FROM
Exadata
is
designed
to
find

t1
matching
data
efficiently

, t2
, tX
WHERE SELECT
*
FROM

t1.colX = … v$sqlfn_metadata

AND t1.id = t2.id …
WHERE
offloadable
=
'YES';

Enterprise
2.0
©
2009-‐2010

WWW.E2SN.COM

25

Service
Networks

www.e2sn.com

System
staXsXcs
on
Exadata?

•  DW
&
smart
scan-‐oriented
databases

•  DW
&
smart
scans
-‐
use
the
default
noworkload
stats

•  MREADTIM
/
SREADTIM
info
not
gathered
anyway
when
the
workload

runs
smart
scans
only

•  OLTP
&
single
block
read-‐oriented
databases

•  Slow
spinning
disk
IO
vs
ﬂash
cache
IO

•  If
running
enXre
DB
on
ﬂash
–
gather
workload
stats

•  If
running
on
mixed
storage
–
I’d
leave
noworkload
stats
(or
set
stats

which
show
regular
disk
IO
response
Xmes
–
5ms+)

Enterprise
2.0
©
2009-‐2010

WWW.E2SN.COM

26

Service
Networks

www.e2sn.com

Exadata
works
well!

•  Unless
you
hit
bugs…

•  …And
your
schema
design
allows
parXXon
pruning

•  …And
you
actually
use
direct
path
segment
scans

•  …And
you
have
good
opXmizer
staXsXcs

•  Especially
if
you
do
use
indexes,
so
that
opXmizer
wouldn’t
end
up

choosing
an
index
instead
of
smart
scan
(underesXmated
row-‐counts)

•  …And
you
don’t
carry
over
old
way
of
thinking

•  Remove
all
old
hints,
rethink
bitmap
indexes,
ensure
parXXon
pruning

Enterprise
2.0
©
2009-‐2010

WWW.E2SN.COM

27

Service
Networks

www.e2sn.com

QuesXons?!

Thanks!!!

tanel@tanelpoder.com

^^^^

Further
quesXons,

exadata
consolidaXon,
migraXon,

troubleshooXng,
performance

&
capacity
planning

Enterprise
2.0
©
2009-‐2010

WWW.E2SN.COM

28

Service
Networks

www.e2sn.com

Tanel Poder - Performance stories from Exadata Migrations

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (12)

Ähnlich wie Tanel Poder - Performance stories from Exadata Migrations

Ähnlich wie Tanel Poder - Performance stories from Exadata Migrations (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Tanel Poder - Performance stories from Exadata Migrations