Benchmarking (RICON 2014)

Benchmarking:
You’re Doing It Wrong
Aysylu
Greenberg
@aysylu22

To
Write
Good
Benchmarks…
Need
to
be
Full
Stack

Benchmark
=
How
Fast?
your
process
vs
Goal
your
process
vs
Best
PracCces

Today
• How
Not
to
Write
Benchmarks
• Benchmark
Setup
&
Results:
-
You’re
wrong
about
machines
-
You’re
wrong
about
stats
-
You’re
wrong
about
what
maLers
• Becoming
Less
Wrong
• Having
Fun
with
Riak

HOW
NOT
TO
WRITE
BENCHMARKS

Website
Serving
Images
• Access
1
image
1000
Cmes
• Latency
measured
for
each
access
• Start
measuring
immediately
• 3
runs
• Find
mean
• Dev
environment
Web
Request
Server
Cache
S3

WHAT’S
WRONG
WITH
THIS
BENCHMARK?

YOU’RE
WRONG
ABOUT
THE
MACHINE

Wrong
About
the
Machine
• Cache,
cache,
cache,
cache!

It’s
Caches
All
The
Way
Down
Web
Request
Server
Cache
S3

It’s
Caches
All
The
Way
Down

Caches
in
Benchmarks
Prof.
Saman
Amarasinghe,
MIT
2009

Wrong
About
the
Machine
• Cache,
cache,
cache,
cache!
• Warmup
&
Cming

Wrong
About
the
Machine
• Cache,
cache,
cache,
cache!
• Warmup
&
Cming
• Periodic
interference

Wrong
About
the
Machine
• Cache,
cache,
cache,
cache!
• Warmup
&
Cming
• Periodic
interference
• Test
!=
Prod

Wrong
About
the
Machine
• Cache,
cache,
cache,
cache!
• Warmup
&
Cming
• Periodic
interference
• Test
!=
Prod
• Power
mode
changes

YOU’RE
WRONG
ABOUT
THE
STATS

Wrong
About
Stats
• Too
few
samples

Wrong
About
Stats
120
100
80
60
40
20
0
Convergence
of
Median
on
Samples
0
10
20
30
40
50
60
Latency
Time
Stable
Samples
Stable
Median
Decaying
Samples
Decaying
Median

Website
Serving
Images
• Access
1
image
1000
Cmes
• Latency
measured
for
each
access
• Start
measuring
immediately
• 3
runs
• Find
mean
• Dev
machine
Web
Request
Server
Cache
S3

Wrong
About
Stats
• Too
few
samples
• Gaussian
(not)

Wrong
About
Stats
• Too
few
samples
• Gaussian
(not)
• MulCmodal
distribuCon

MulCmodal
DistribuCon
50%
99%
#
occurrences
Latency
5
ms
10
ms

Wrong
About
Stats
• Too
few
samples
• Gaussian
(not)
• MulCmodal
distribuCon
• Outliers

YOU’RE
WRONG
ABOUT
WHAT
MATTERS

Wrong
About
What
MaLers
• Premature
opCmizaCon

“Programmers
waste
enormous
amounts
of
Cme
thinking
about
…
the
speed
of
noncriCcal
parts
of
their
programs
...
Forget
about
small
efficiencies
…97%
of
the
Cme:
premature
opHmizaHon
is
the
root
of
all
evil.
Yet
we
should
not
pass
up
our
opportuniCes
in
that
criCcal
3%.”
-‐-‐
Donald
Knuth

Wrong
About
What
MaLers
• Premature
opCmizaCon
• UnrepresentaCve
workloads

Wrong
About
What
MaLers
• Premature
opCmizaCon
• UnrepresentaCve
workloads
• Memory
pressure

Wrong
About
What
MaLers
• Premature
opCmizaCon
• UnrepresentaCve
workloads
• Memory
pressure
• Load
balancing

Wrong
About
What
MaLers
• Premature
opCmizaCon
• UnrepresentaCve
workloads
• Memory
pressure
• Load
balancing
• Reproducibility
of
measurements

User
AcCons
MaLer
X
>
Y
for
workload
Z
with
trade
offs
A,
B,
and
C
-‐
hLp://www.toomuchcode.org/

Profiling
Code
instrumentaCon
Aggregate
over
logs
Traces

Microbenchmarking:
Blessing
&
Curse
+ Quick
&
cheap
+ Answers
narrow
?s
well
- Osen
misleading
results
- Not
representaCve
of
the
program

Microbenchmarking:
Blessing
&
Curse
• Choose
your
N
wisely

Choose
Your
N
Wisely
Prof.
Saman
Amarasinghe,
MIT
2009

Microbenchmarking:
Blessing
&
Curse
• Choose
your
N
wisely
• Measure
side
effects

Microbenchmarking:
Blessing
&
Curse
• Choose
your
N
wisely
• Measure
side
effects
• Beware
of
clock
resoluCon

Microbenchmarking:
Blessing
&
Curse
• Choose
your
N
wisely
• Measure
side
effects
• Beware
of
clock
resoluCon
• Dead
Code
EliminaCon

Microbenchmarking:
Blessing
&
Curse
• Choose
your
N
wisely
• Measure
side
effects
• Beware
of
clock
resoluCon
• Dead
Code
EliminaCon
• Constant
work
per
iteraCon

Non-‐Constant
Work
Per
IteraCon

Follow-‐up
Material
• How
NOT
to
Measure
Latency
by
Gil
Tene
– hLp://www.infoq.com/presentaCons/latency-‐piualls
• Taming
the
Long
Latency
Tail
on
highscalability.com
– hLp://highscalability.com/blog/2012/3/12/google-‐taming-‐
the-‐long-‐latency-‐tail-‐when-‐more-‐machines-‐equal.html
• Performance
Analysis
Methodology
by
Brendan
Gregg
– hLp://www.brendangregg.com/methodology.html
• Silverman’s
Mode
Detec@on
Method
by
MaL
Adereth
– hLp://adereth.github.io/blog/2014/10/12/silvermans-‐
mode-‐detecCon-‐method-‐explained/

Setup
• SSD
30
GB
• M3
large
• Riak
version
1.4.2-‐0-‐g61ac9d8
• Ubuntu
12.04.5
LTS
• 4
byte
keys,
10
KB
values

2350
2300
2250
2200
2150
2100
2050
2000
1950
1900
1850
Latency
(usec)
Get
Latency
L3
Number
of
Keys

Benchmarking (RICON 2014)

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Benchmarking (RICON 2014)

Ähnlich wie Benchmarking (RICON 2014) (20)

Mehr von Aysylu Greenberg

Mehr von Aysylu Greenberg (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Benchmarking (RICON 2014)