2. We all care about performance evaluation
We’ve been doing it wrong
STABILIZER
Repeated runs and error bars not enough
We’re not measuring what we thought
3. changing a program changes its layout
STABILIZER
Memory layout affects performance
STABILIZER eliminates the effect of layout
no way to measure effect of change in isolation
evaluation of LLVM’s optimizations with STABILIZER
Case Studies
enables sound performance evaluation
We’ve been doing it wrong
4. STABILIZER
We’ve been doing it wrong
changing a program changes its layout
Memory layout affects performance
STABILIZER eliminates the effect of layout
no way to measure effect of change in isolation
evaluation of LLVM’s optimizations with STABILIZER
Case Studies
enables sound performance evaluation
25. Layout biases measurement
Mytkowicz et al. (ASPLOS’09)
Link Order
Environment
Variable Size
Changes function addresses
Moves the program stack
26. Layout biases measurement
Mytkowicz et al. (ASPLOS’09)
Link Order
Environment
Variable Size
Changes function addresses
Moves the program stack
Larger than
impact of -O3
27. Blame the cache
int main(int argc, char **argv) {
topFrame = (void**)__builtin_frame_address(0);
setHandler(Trap::TrapSignal, onTrap);
setHandler(SIGALRM, onTimer);
setHandler(SIGSEGV, onFault);
for(Function* f: functions) {
f->setTrap();
}
setTimer(interval);
int r = stabilizer_main(argc, argv);
return r;
}
void setTimer(int msec) {
struct itimerval timer;
timer.it_value.tv_sec = (msec - msec % 1000) / 1000;
timer.it_value.tv_usec = 1000 * (msec % 1000);
timer.it_interval.tv_sec = 0;
timer.it_interval.tv_usec = 0;
setitimer(ITIMER_REAL, &timer, 0);
}
static void flush_icache(void* begin, size_t size) {
uintptr_t p = (uintptr_t)begin & ~15UL;
for (size_t i = 0; i < size; i += 16) {
asm("icbi 0,%0" : : "r"(p));
p += 16;
}
asm("isync");
}
DataHeapType* getDataHeap() {
static char buf[sizeof(DataHeapType)];
static DataHeapType* _theDataHeap = new (buf) DataHeapType;
return _theDataHeap;
}
A
28. Blame the cache
A
int main(int argc, char **argv) {
topFrame = (void**)__builtin_frame_address(0);
setHandler(Trap::TrapSignal, onTrap);
setHandler(SIGALRM, onTimer);
setHandler(SIGSEGV, onFault);
for(Function* f: functions) {
f->setTrap();
}
setTimer(interval);
int r = stabilizer_main(argc, argv);
return r;
}
void setTimer(int msec) {
struct itimerval timer;
timer.it_value.tv_sec = (msec - msec % 1000) / 1000;
timer.it_value.tv_usec = 1000 * (msec % 1000);
timer.it_interval.tv_sec = 0;
timer.it_interval.tv_usec = 0;
setitimer(ITIMER_REAL, &timer, 0);
}
static void flush_icache(void* begin, size_t size) {
uintptr_t p = (uintptr_t)begin & ~15UL;
for (size_t i = 0; i < size; i += 16) {
asm("icbi 0,%0" : : "r"(p));
p += 16;
}
asm("isync");
}
DataHeapType* getDataHeap() {
static char buf[sizeof(DataHeapType)];
static DataHeapType* _theDataHeap = new (buf) DataHeapType;
return _theDataHeap;
}
int main(int argc, char **argv) {
topFrame = (void**)__builtin_frame_address(0);
setHandler(Trap::TrapSignal, onTrap);
setHandler(SIGALRM, onTimer);
setHandler(SIGSEGV, onFault);
for(Function* f: functions) {
f->setTrap();
}
setTimer(interval);
int r = stabilizer_main(argc, argv);
return r;
}
void setTimer(int msec) {
struct itimerval timer;
timer.it_value.tv_sec = (msec - msec % 1000) / 1000;
timer.it_value.tv_usec = 1000 * (msec % 1000);
timer.it_interval.tv_sec = 0;
timer.it_interval.tv_usec = 0;
setitimer(ITIMER_REAL, &timer, 0);
}
static void flush_icache(void* begin, size_t size) {
uintptr_t p = (uintptr_t)begin & ~15UL;
for (size_t i = 0; i < size; i += 16) {
asm("icbi 0,%0" : : "r"(p));
p += 16;
}
asm("isync");
}
DataHeapType* getDataHeap() {
static char buf[sizeof(DataHeapType)];
static DataHeapType* _theDataHeap = new (buf) DataHeapType;
return _theDataHeap;
}
conflict
map to same
cache set
39. it’s faster
But it ran faster!
What if we only use this layout?
Upgrade libc
Changes layout
40. it’s faster
But it ran faster!
What if we only use this layout?
Change Username
Changes layout
41. Layout is Brittle
it’s faster
But it ran faster!
What if we only use this layout?
Run in a new
directory
Changes layout
42. Layout is Brittle
But it ran faster!
What if we only use this layout?
Layout biases measurement
Mytkowicz et al. (ASPLOS’09)
Can we eliminate the
effect of layout?
43. But it ran faster!
What if we only use this layout?
Layout biases measurement
Can we eliminate the
effect of layout?
YES
44. STABILIZER
Memory layout affects performance
STABILIZER eliminates the effect of layout
enables sound performance evaluation
evaluation of LLVM’s optimizations with STABILIZER
Case Studies
makes performance evaluation difficult
45. STABILIZER
Memory layout affects performance
STABILIZER eliminates the effect of layout
enables sound performance evaluation
evaluation of LLVM’s optimizations with STABILIZER
Case Studies
makes performance evaluation difficult
60. this speedup is real
0%
10%
20%
30%
40%
85.0 87.5 90.0 92.5 95.0
Time (s)
PercentofObservedRuntimes
Sound Performance
Evaluation
STABILIZER
randomizes layoutrepeatedly
what does
re-randomization do?
not due to the effect
on memory layout
61. 0.0
0.2
0.4
350 360 370 380 390 400
Time (s)
ProbabilityDensity
STABILIZER
randomizes layoutrepeatedly
one random
layout per-run
63. STABILIZER generates a new
random layout every ½ second
Total execution time is
the sum of all periods
STABILIZER
randomizes layoutrepeatedly
64. STABILIZER generates a new
random layout every ½ second
Total execution time is
the sum of all periods
The sum of a sufficient number of
independent, identically distributed random
variables is approximately normally distributed.
STABILIZER
randomizes layoutrepeatedly
65. STABILIZER generates a new
random layout every ½ second
Total execution time is
the sum of all periods
The sum of a sufficient number of
independent, identically distributed random
variables is approximately normally distributed.
STABILIZER
randomizes layoutrepeatedly
66. STABILIZER generates a new
random layout every ½ second
Total execution time is
the sum of all periods
The sum of a sufficient number of
independent, identically distributed random
variables is approximately normally distributed.
STABILIZER
randomizes layoutrepeatedly
67. Central Limit Theorem
execution times are
normally distributed
The sum of a sufficient number of
independent, identically distributed random
variables is approximately normally distributed.
68. STABILIZER
Memory layout affects performance
STABILIZER eliminates the effect of layout
enables sound performance evaluation
evaluation of LLVM’s optimizations with STABILIZER
Case Studies
makes performance evaluation difficult
69. STABILIZER
makes performance evaluation difficult
Memory layout affects performance
STABILIZER eliminates the effect of layout
enables sound performance evaluation
evaluation of LLVM’s optimizations with STABILIZER
Case Studies
70. Case Studies
on each benchmark
across the whole
benchmark suite
evaluation of LLVM’s optimizations with STABILIZER
first, build benchmarks with STABILIZER
91. Speedup of -O3 over -O2
0.0%
0.5%
1.0%
1.5%
bzip2gobm
kzeusm
p
libquantum
w
rf
astar
m
cfhm
m
er
m
ilc
nam
d
gcc
lbmgrom
acsh264ref
cactusA
DMperlbenchsphinx3
sjeng
Speedup
Significant
Yes
No
108. If = -O2-O3
4%
imes
what is the probability of measuring
these differences?
Analysis ofVariance
aov(time~opt+Error(benchmark/opt), times)
109. If = -O2-O3
4%
imes
what is the probability of measuring
these differences?
Analysis ofVariance
If p-value
110. If = -O2-O3
4%
imes
what is the probability of measuring
these differences?
Analysis ofVariance
If p-value ≤ 5%
111. If = -O2-O3
4%
imes
what is the probability of measuring
these differences?
Analysis ofVariance
If p-value ≤ 5%
we reject the null hypothesis
112. Analysis ofVariance
If p-value ≤ 5%
we reject the null hypothesis
p-value = 26.4%
-O3 -O2vs
are we 73.6% confident?
one in four experiments will show an
effect that does not exist!
113. Analysis ofVariance
If p-value ≤ 5%
we reject the null hypothesis
p-value = 26.4%
fail to reject the null hypothesis
-O3 -O2vs
114. -O3
Analysis ofVariance
we reject the null hypothesis
-O2The effect of over is
indistinguishable from noise
If p-value ≤ 5%
Did STABILIZER hide the effect?
115. Runtime with -O2
Runtime with -O3
Runtime with -O1
Runtime with -O0
Execution Time
Did STABILIZER hide the effect?
117. Runtime with -O2
Runtime with -O3
Runtime with -O1
Runtime with -O0
STABILIZER
STABILIZER
STABILIZERSTABILIZER
Execution Time
speedups
Did STABILIZER hide the effect?
119. STABILIZER
Memory layout affects performance
STABILIZER eliminates the effect of layout
showed that -O3 does not have a statistically
significant effect across our benchmarks
Case Studies
random layout enables sound performance evaluation
makes performance evaluation difficult