Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Benchmarking: 
You’re 
Doing 
It 
Wrong 
Aysylu 
Greenberg 
@aysylu22
In 
Memes…
To 
Write 
Good 
Benchmarks… 
Need 
to 
be 
Full 
Stack
What’s 
a 
Benchmark 
How 
fast? 
Your 
process 
vs 
Goal 
Your 
process 
vs 
Best 
PracLces
Today 
• How 
Not 
to 
Write 
Benchmarks 
• Benchmark 
Setup 
& 
Results: 
- 
Wrong 
about 
the 
machine 
- 
Wrong 
about ...
HOW 
NOT 
TO 
WRITE 
BENCHMARKS
Website 
Serving 
Images 
• Access 
1 
image 
1000 
Lmes 
• Latency 
measured 
for 
each 
access 
• Start 
measuring 
imme...
WHAT’S 
WRONG 
WITH 
THIS 
BENCHMARK?
You’re 
wrong 
about 
the 
machine 
BENCHMARK 
SETUP 
& 
RESULTS: 
COMMON 
PITFALLS
Wrong 
About 
the 
Machine 
• Cache, 
cache, 
cache, 
cache!
It’s 
Caches 
All 
The 
Way 
Down
It’s 
Caches 
All 
The 
Way 
Down
Caches 
in 
Benchmarks 
Prof. 
Saman 
Amarasinghe, 
MIT 
2009
Caches 
in 
Benchmarks 
Prof. 
Saman 
Amarasinghe, 
MIT 
2009
Caches 
in 
Benchmarks 
Prof. 
Saman 
Amarasinghe, 
MIT 
2009
Caches 
in 
Benchmarks 
Prof. 
Saman 
Amarasinghe, 
MIT 
2009
Caches 
in 
Benchmarks 
Prof. 
Saman 
Amarasinghe, 
MIT 
2009
Website 
Serving 
Images 
• Access 
1 
image 
1000 
Lmes 
• Latency 
measured 
for 
each 
access 
• Start 
measuring 
imme...
Wrong 
About 
the 
Machine 
• Cache, 
cache, 
cache, 
cache! 
• Warmup 
& 
Timing
Website 
Serving 
Images 
• Access 
1 
image 
1000 
Lmes 
• Latency 
measured 
for 
each 
access 
• Start 
measuring 
imme...
Wrong 
About 
the 
Machine 
• Cache, 
cache, 
cache, 
cache! 
• Warmup 
& 
Timing 
• Periodic 
interference
Website 
Serving 
Images 
• Access 
1 
image 
1000 
Lmes 
• Latency 
measured 
for 
each 
access 
• Start 
measuring 
imme...
Wrong 
About 
the 
Machine 
• Cache, 
cache, 
cache, 
cache! 
• Warmup 
& 
Timing 
• Periodic 
interference 
• Different 
...
Website 
Serving 
Images 
• Access 
1 
image 
1000 
Lmes 
• Latency 
measured 
for 
each 
access 
• Start 
measuring 
imme...
Wrong 
About 
the 
Machine 
• Cache, 
cache, 
cache, 
cache! 
• Warmup 
& 
Timing 
• Periodic 
interference 
• Different 
...
You’re 
wrong 
about 
the 
stats 
BENCHMARK 
SETUP 
& 
RESULTS: 
COMMON 
PITFALLS
Wrong 
About 
Stats 
• Too 
few 
samples
Wrong 
About 
Stats 
120 
100 
80 
60 
40 
20 
0 
Convergence 
of 
Median 
on 
Samples 
0 
10 
20 
30 
40 
50 
60 
Latency...
Website 
Serving 
Images 
• Access 
1 
image 
1000 
Lmes 
• Latency 
measured 
for 
each 
access 
• Start 
measuring 
imme...
Wrong 
About 
Stats 
• Too 
few 
samples 
• Non-­‐Gaussian
Website 
Serving 
Images 
• Access 
1 
image 
1000 
Lmes 
• Latency 
measured 
for 
each 
access 
• Start 
measuring 
imme...
Wrong 
About 
Stats 
• Too 
few 
samples 
• Non-­‐Gaussian 
• MulLmodal 
distribuLon
MulLmodal 
DistribuLon 
50% 
99% 
# 
occurrences 
Latency 
5 
ms 
10 
ms
Wrong 
About 
Stats 
• Too 
few 
samples 
• Non-­‐Gaussian 
• MulLmodal 
distribuLon 
• Outliers
You’re 
wrong 
about 
what 
maOers 
BENCHMARK 
SETUP 
& 
RESULTS: 
COMMON 
PITFALLS
Wrong 
About 
What 
MaOers 
• Premature 
opLmizaLon
“Programmers 
waste 
enormous 
amounts 
of 
Lme 
thinking 
about 
… 
the 
speed 
of 
noncriLcal 
parts 
of 
their 
program...
Wrong 
About 
What 
MaOers 
• Premature 
opLmizaLon 
• UnrepresentaLve 
Workloads
Wrong 
About 
What 
MaOers 
• Premature 
opLmizaLon 
• UnrepresentaLve 
Workloads 
• Memory 
pressure
The 
How 
BECOMING 
LESS 
WRONG
Becoming 
Less 
Wrong 
User 
AcLons 
MaOer 
X 
> 
Y 
for 
workload 
Z 
with 
trade 
offs 
A, 
B, 
and 
C 
-­‐ 
hOp://www.t...
Becoming 
Less 
Wrong 
Profiling 
Code 
InstrumentaLon 
Aggregate 
Over 
Logs 
Traces
Microbenchmarking: 
Blessing 
& 
Curse 
+ Quick 
& 
cheap 
+ Answers 
narrow 
?s 
well 
- Open 
misleading 
results 
- Not...
Microbenchmarking: 
Blessing 
& 
Curse 
• Choose 
your 
N 
wisely
Choose 
Your 
N 
Wisely 
Prof. 
Saman 
Amarasinghe, 
MIT 
2009
Microbenchmarking: 
Blessing 
& 
Curse 
• Choose 
your 
N 
wisely 
• Measure 
side 
effects
Microbenchmarking: 
Blessing 
& 
Curse 
• Choose 
your 
N 
wisely 
• Measure 
side 
effects 
• Beware 
of 
clock 
resoluLo...
Microbenchmarking: 
Blessing 
& 
Curse 
• Choose 
your 
N 
wisely 
• Measure 
side 
effects 
• Beware 
of 
clock 
resoluLo...
Microbenchmarking: 
Blessing 
& 
Curse 
• Choose 
your 
N 
wisely 
• Measure 
side 
effects 
• Beware 
of 
clock 
resoluLo...
Non-­‐Constant 
Work 
Per 
IteraLon
Follow-­‐up 
Material 
• How 
NOT 
to 
Measure 
Latency 
by 
Gil 
Tene 
– hOp://www.infoq.com/presentaLons/latency-­‐piral...
Takeaway 
#1: 
Cache
Takeaway 
#2: 
Outliers
Takeaway 
#3: 
Workload
Benchmarking: 
You’re 
Doing 
It 
Wrong 
Aysylu 
Greenberg 
@aysylu22
Nächste SlideShare
Wird geladen in …5
×

Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

2.206 Aufrufe

Veröffentlicht am

Knowledge of how to set up good benchmarks is invaluable in understanding performance of the system. Writing correct and useful benchmarks is hard, and verification of the results is difficult and prone to errors. When done right, benchmarks guide teams to improve the performance of their systems. When done wrong, hours of effort may result in a worse performing application, upset customers or worse! In this talk, we will discuss what you need to know to write better benchmarks. We will look at examples of bad benchmarks and learn about what biases can invalidate the measurements, in the hope of correctly applying our new-found skills and avoiding such pitfalls in the future.

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

Benchmarking: You're Doing It Wrong (StrangeLoop 2014)

  1. 1. Benchmarking: You’re Doing It Wrong Aysylu Greenberg @aysylu22
  2. 2. In Memes…
  3. 3. To Write Good Benchmarks… Need to be Full Stack
  4. 4. What’s a Benchmark How fast? Your process vs Goal Your process vs Best PracLces
  5. 5. Today • How Not to Write Benchmarks • Benchmark Setup & Results: - Wrong about the machine - Wrong about stats - Wrong about what maOers • Becoming Less Wrong
  6. 6. HOW NOT TO WRITE BENCHMARKS
  7. 7. Website Serving Images • Access 1 image 1000 Lmes • Latency measured for each access • Start measuring immediately • 3 runs • Find mean • Dev machine Web Request Server Cache S3
  8. 8. WHAT’S WRONG WITH THIS BENCHMARK?
  9. 9. You’re wrong about the machine BENCHMARK SETUP & RESULTS: COMMON PITFALLS
  10. 10. Wrong About the Machine • Cache, cache, cache, cache!
  11. 11. It’s Caches All The Way Down
  12. 12. It’s Caches All The Way Down
  13. 13. Caches in Benchmarks Prof. Saman Amarasinghe, MIT 2009
  14. 14. Caches in Benchmarks Prof. Saman Amarasinghe, MIT 2009
  15. 15. Caches in Benchmarks Prof. Saman Amarasinghe, MIT 2009
  16. 16. Caches in Benchmarks Prof. Saman Amarasinghe, MIT 2009
  17. 17. Caches in Benchmarks Prof. Saman Amarasinghe, MIT 2009
  18. 18. Website Serving Images • Access 1 image 1000 Lmes • Latency measured for each access • Start measuring immediately • 3 runs • Find mean • Dev machine Web Request Server Cache S3
  19. 19. Wrong About the Machine • Cache, cache, cache, cache! • Warmup & Timing
  20. 20. Website Serving Images • Access 1 image 1000 Lmes • Latency measured for each access • Start measuring immediately • 3 runs • Find mean • Dev machine Web Request Server Cache S3
  21. 21. Wrong About the Machine • Cache, cache, cache, cache! • Warmup & Timing • Periodic interference
  22. 22. Website Serving Images • Access 1 image 1000 Lmes • Latency measured for each access • Start measuring immediately • 3 runs • Find mean • Dev machine Web Request Server Cache S3
  23. 23. Wrong About the Machine • Cache, cache, cache, cache! • Warmup & Timing • Periodic interference • Different specs in test vs prod machines
  24. 24. Website Serving Images • Access 1 image 1000 Lmes • Latency measured for each access • Start measuring immediately • 3 runs • Find mean • Dev machine Web Request Server Cache S3
  25. 25. Wrong About the Machine • Cache, cache, cache, cache! • Warmup & Timing • Periodic interference • Different specs in test vs prod machines • Power mode changes
  26. 26. You’re wrong about the stats BENCHMARK SETUP & RESULTS: COMMON PITFALLS
  27. 27. Wrong About Stats • Too few samples
  28. 28. Wrong About Stats 120 100 80 60 40 20 0 Convergence of Median on Samples 0 10 20 30 40 50 60 Latency Time Stable Samples Stable Median Decaying Samples Decaying Median
  29. 29. Website Serving Images • Access 1 image 1000 Lmes • Latency measured for each access • Start measuring immediately • 3 runs • Find mean • Dev machine Web Request Server Cache S3
  30. 30. Wrong About Stats • Too few samples • Non-­‐Gaussian
  31. 31. Website Serving Images • Access 1 image 1000 Lmes • Latency measured for each access • Start measuring immediately • 3 runs • Find mean • Dev machine Web Request Server Cache S3
  32. 32. Wrong About Stats • Too few samples • Non-­‐Gaussian • MulLmodal distribuLon
  33. 33. MulLmodal DistribuLon 50% 99% # occurrences Latency 5 ms 10 ms
  34. 34. Wrong About Stats • Too few samples • Non-­‐Gaussian • MulLmodal distribuLon • Outliers
  35. 35. You’re wrong about what maOers BENCHMARK SETUP & RESULTS: COMMON PITFALLS
  36. 36. Wrong About What MaOers • Premature opLmizaLon
  37. 37. “Programmers waste enormous amounts of Lme thinking about … the speed of noncriLcal parts of their programs ... Forget about small efficiencies …97% of the Lme: premature opKmizaKon is the root of all evil. Yet we should not pass up our opportuniLes in that criLcal 3%.” -­‐-­‐ Donald Knuth
  38. 38. Wrong About What MaOers • Premature opLmizaLon • UnrepresentaLve Workloads
  39. 39. Wrong About What MaOers • Premature opLmizaLon • UnrepresentaLve Workloads • Memory pressure
  40. 40. The How BECOMING LESS WRONG
  41. 41. Becoming Less Wrong User AcLons MaOer X > Y for workload Z with trade offs A, B, and C -­‐ hOp://www.toomuchcode.org/
  42. 42. Becoming Less Wrong Profiling Code InstrumentaLon Aggregate Over Logs Traces
  43. 43. Microbenchmarking: Blessing & Curse + Quick & cheap + Answers narrow ?s well - Open misleading results - Not representaLve of the program
  44. 44. Microbenchmarking: Blessing & Curse • Choose your N wisely
  45. 45. Choose Your N Wisely Prof. Saman Amarasinghe, MIT 2009
  46. 46. Microbenchmarking: Blessing & Curse • Choose your N wisely • Measure side effects
  47. 47. Microbenchmarking: Blessing & Curse • Choose your N wisely • Measure side effects • Beware of clock resoluLon
  48. 48. Microbenchmarking: Blessing & Curse • Choose your N wisely • Measure side effects • Beware of clock resoluLon • Dead Code EliminaLon
  49. 49. Microbenchmarking: Blessing & Curse • Choose your N wisely • Measure side effects • Beware of clock resoluLon • Dead Code EliminaLon • Constant work per iteraLon
  50. 50. Non-­‐Constant Work Per IteraLon
  51. 51. Follow-­‐up Material • How NOT to Measure Latency by Gil Tene – hOp://www.infoq.com/presentaLons/latency-­‐piralls • Taming the Long Latency Tail on highscalability.com – hOp://highscalability.com/blog/2012/3/12/google-­‐taming-­‐the-­‐ long-­‐latency-­‐tail-­‐when-­‐more-­‐machines-­‐equal.html • Performance Analysis Methodology by Brendan Gregg – hOp://www.brendangregg.com/methodology.html • Robust Java benchmarking by Brent Boyer – hOp://www.ibm.com/developerworks/library/j-­‐benchmark1/ – hOp://www.ibm.com/developerworks/library/j-­‐benchmark2/ • Benchmarking arLcles by Aleksey Shipilëv – hOp://shipilev.net/#benchmarking
  52. 52. Takeaway #1: Cache
  53. 53. Takeaway #2: Outliers
  54. 54. Takeaway #3: Workload
  55. 55. Benchmarking: You’re Doing It Wrong Aysylu Greenberg @aysylu22

×