Joker 2015 - Валеев Тагир - Что же мы измеряем?

Что же мы измеряем?
Валеев Тагир
Институт систем информатики СО РАН
1

Disclaimer
Я
могу
врать!
2

StreamEx
https://github.com/amaembo/streamex
3

Насколько Stream API медленнее?
static long sumTwice(int max) {
long sum = 0;
for(int i=1; i<=max; i++) sum+=i*2;
return sum;
}
4
static long sumTwiceStream(int max) {
return IntStream.rangeClosed(1, max)
.mapToLong(x -> x*2).sum();
}

Спрашивает StackOverflow
5
http://stackoverflow.com/q/31761271/4856258 (удалено автором)
Predicate<Integer> в Java 8 быстрее, чем IntPredicate
В Java 8 предлагается использовать IntPredicate вместо Predicate<Integer>и
аналогично для других примитивных типов, так как таким образом можно
избавиться от накладных расходов на автобоксинг, но когда я запускаю
нижеследующий код, я получаю совершенно противоположный результат:
на моей системе IntPredicate в 30-50 раз медленее, чем Predicate.
In Java 8 Predicate<Integer> is faster than IntPredicate
In Java 8 it is suggested that we should use IntPredicate rather than Predicate<Integer> and same for
other premitive types as former one reduces the overhead related to autoboxing but when i run the
following code. I get results shockingly opposite as IntPredicate is 30-50 times slower than Predicate
on my system.

Спрашивает StackOverflow
6
Long start = System.currentTimeMillis();
IntPredicate evenNumPredicate = (int i) -> i % 2 == 0;
evenNumPredicate.test(1000);
System.out.println(System.currentTimeMillis()-start);
start = System.currentTimeMillis();
Predicate<Integer> evenNumPredicate1 =
(Integer i) -> i % 2 == 0;
evenNumPredicate1.test(1000);
System.out.println(System.currentTimeMillis()-start);

Реакция сообщества
Это потрясающе плохой бенчмарк.
В него можно внести несколько
кардинальных улучшений, и это
всё равно будет плохой бенчмарк.
– Marko Topolnik
This benchmark is shockingly bad. It could be
improved in several significant ways and still be a
bad benchmark.
7

Наивняк
public static void main(String[] args) {
long startSimple = System.nanoTime();
long resultSimple = sumTwice(10_000_000);
long endSimple = System.nanoTime();
System.out.printf("Simple: %d; time=%8.3fms%n",
resultSimple, (endSimple-startSimple)/1_000_000.0);
long startStream = System.nanoTime();
long resultStream = sumTwiceStream(10_000_000);
long endStream = System.nanoTime();
System.out.printf("Stream: %d; time=%8.3fms%n",
resultStream, (endStream-startStream)/1_000_000.0);
}
8

Результаты наивняка
9
Simple: 100000010000000; time= 8.286ms
Stream: 100000010000000; time= 57.774ms
-Xint
Simple: 100000010000000; time= 136.347ms
Stream: 100000010000000; time=1647.296ms
-XX:-UseOnStackReplacement
Simple: 100000010000000; time= 150.321ms
Stream: 100000010000000; time= 374.367ms
-XX:-UseOnStackReplacement -XX:-UseLoopCounter
Simple: 100000010000000; time= 136.584ms
Stream: 100000010000000; time= 364.105ms

Насколько Stream API медленнее?
long sum = 0;
for(int i=1; i<=max; i++) sum+=i*2;
return sum;
}
10
static long sumTwiceStream(int max) {
return IntStream.rangeClosed(1, max)
.mapToLong(x -> x*2).sum();
}

Интерпретатор и JIT-компилятор
11

12
Simple: 100000010000000; time= 8.286ms
Stream: 100000010000000; time= 57.774ms
-Xint
Simple: 100000010000000; time= 136.347ms
Stream: 100000010000000; time=1647.296ms
Simple: 100000010000000; time= 150.321ms
Stream: 100000010000000; time= 374.367ms
Simple: 100000010000000; time= 136.584ms
Stream: 100000010000000; time= 364.105ms

On-Stack Replacement (OSR)
Замена на стеке
13

14
Simple: 100000010000000; time= 8.286ms
Stream: 100000010000000; time= 57.774ms
-Xint
Simple: 100000010000000; time= 136.347ms
Stream: 100000010000000; time=1647.296ms
Simple: 100000010000000; time= 150.321ms
Stream: 100000010000000; time= 374.367ms
Simple: 100000010000000; time= 136.584ms
Stream: 100000010000000; time= 364.105ms

Simple: 100000010000000; time= 8.286ms
Stream: 100000010000000; time= 57.774ms
-Xint
Simple: 100000010000000; time= 136.347ms
Stream: 100000010000000; time=1647.296ms
Simple: 100000010000000; time= 150.321ms
Stream: 100000010000000; time= 374.367ms
Simple: 100000010000000; time= 136.584ms
Stream: 100000010000000; time= 364.105ms
15

Stream-операции
16
return IntStream
.rangeClosed(1, max) // создание Stream
.mapToLong(x -> x*2) // промежуточная операция
.sum(); // конечная операция

Иерархия вызовов Stream
LongPipeline::sum
LongPipeline::reduce
AbstractPipeline::evaluate
ReduceOp::evaluateSequential
AbstractPipeline::wrapAndCopyInto
AbstractPipeline::copyInto
Spliterator.OfInt::forEachRemaining
RangeIntSpliterator::forEachRemaining (цикл)
IntPipeline$5$1::accept (mapToLong)
сгенерированный λ-класс
λ-функция x -> x*2
ReducingSink::accept (reduce)
Long::sum
17

LongPipeline::sum
Long::sum
18
МАГИЯ

19
LongPipeline::sum
Long::sum
МАГИЯ

20
Simple: 100000010000000; time= 8.286ms
Stream: 100000010000000; time= 57.774ms
-Xint
Simple: 100000010000000; time= 136.347ms
Stream: 100000010000000; time=1647.296ms
Simple: 100000010000000; time= 150.321ms
Stream: 100000010000000; time= 374.367ms
Simple: 100000010000000; time= 136.584ms
Stream: 100000010000000; time= 364.105ms

JMH
• Инструкция и примеры здесь:
http://openjdk.java.net/projects/code-tools/jmh/
• $ mvn archetype:generate
-DinteractiveMode=false -DarchetypeGroupId=org.openjdk.jmh
-DarchetypeArtifactId=jmh-java-benchmark-archetype
-DgroupId=org.sample -DartifactId=test -Dversion=1.0
• $ mvn clean install
• pom.xml: <javac.target>1.6</javac.target>
21

JMH benchmark
public class MyBenchmark {
@Benchmark
public long stream() {
return sumTwiceStream(10_000_000);
}
@Benchmark
public long simple() {
return sumTwice(10_000_000);
}
...
}
22

JMH benchmark
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Fork(5)
@State(Scope.Benchmark)
public class MyBenchmark {
...
}
[jmhtest/target]$ java –jar benchmark.jar >out.txt
23

JMH benchmark – результаты
# JMH 1.11.1 (released 7 days ago)
# VM version: JDK 1.8.0_60, VM 25.60-b23
...
Benchmark Mode Cnt Score Error Units
MyBenchmark.simple avgt 50 4.535 ± 0.009 ms/op
MyBenchmark.stream avgt 50 4.123 ± 0.009 ms/op
____________________________________________________________
Simple: 100000010000000; time= 8.286ms
Stream: 100000010000000; time= 57.774ms
24

JMH benchmark – результаты
# JMH 1.11.1 (released 7 days ago)
# VM version: JDK 1.8.0_60, VM 25.60-b23
...
Benchmark Mode Cnt Score Error Units
MyBenchmark.simple avgt 50 4.535 ± 0.009 ms/op
MyBenchmark.stream avgt 50 4.123 ± 0.009 ms/op
____________________________________________________________
Simple: 100000010000000; time= 8.286ms
Stream: 100000010000000; time= 57.774ms
25

JMH – параметризуем
@Param({"100000", "1000000", "10000000"})
private int n;
@Benchmark
public long stream() {
return sumTwiceStream(n);
}
@Benchmark
public long simple() {
return sumTwice(n);
}
26

JMH с параметром – результаты
Benchmark (n) Score Error Units
MyBenchmark.simple 100000 0.048 ± 0.001 ms/op
MyBenchmark.stream 100000 0.567 ± 0.002 ms/op
27

n = 1_000_000
# Warmup Iteration 1: 0.446 ms/op
Iteration 1: 5.821 ms/op
...
28

n = 10_000_000
...
29

n = 10_000_000
[jmhtest/target]$ java –jar benchmark.jar –i 20 >out.txt
...
...
30

Наивняк с циклом
public static void main(String[] args) {
for(int i=0; i<6000; i++) {
long start = System.nanoTime();
long result = sumTwiceStream(10_000_000);
long end = System.nanoTime();
System.out.printf("#%d: %d; time=%8.3fms%n", i,
result, (end-start)/1_000_000.0);
}
}
31

Наивняк с циклом – результат
#0: 100000010000000; time= 63.484ms
#1: 100000010000000; time= 10.328ms
#2: 100000010000000; time= 3.931ms
#3: 100000010000000; time= 4.051ms
…
#5630: 100000010000000; time= 3.938ms
#5631: 100000010000000; time= 4.154ms
#5632: 100000010000000; time= 4.150ms
#5633: 100000010000000; time= 3.955ms
#5634: 100000010000000; time= 4.011ms
#5635: 100000010000000; time= 58.184ms
#5636: 100000010000000; time= 58.058ms
#5637: 100000010000000; time= 57.024ms
32

-XX:+PrintCompilation
77 1 3 java.lang.String::equals (81 bytes)
78 2 3 java.lang.String::hashCode (55 bytes)
...
80 13 n 0 java.lang.System::arraycopy (native) (static)
...
81 16 s 3 java.lang.StringBuffer::append (13 bytes)
...
106 68 ! 3 java.lang.ref.ReferenceQueue::poll (28 bytes)
...
141 221 % 4 ...$RangeIntSpliterator::forEachRemaining @ 34 (65 bytes)
144 219 % 3 ...$RangeIntSpliterator::forEachRemaining @ -2 (65 bytes)
made not entrant
33

-XX:+PrintCompilation: читаем лог
148 221 % 4 ...::forEachRemaining @ 34 (65 bytes)
tstamp compile_id attrs comp_level name [@ osr_pos] (size) [status]
• tstamp – время в миллисекундах с начала выполнения
• compile_id – номер задачи на компиляцию в очереди
• attrs – атрибуты
• comp_level – номер уровня tier-компиляции
• name – имя класса и метода
• osr_pos – позиция в байткоде, на которой выполняется OSR
• size – размер байткода метода в байтах (или “native”)
• status – дополнительная информация о событии
34

-XX:+PrintCompilation: атрибуты
• n – обёртка для native-метода (по факту не компиляция)
• % – on-stack replacement
• s – метод объявлен synchronized
• ! – есть обработчик исключений
• b – компиляция блокирует выполнение
35

-XX:+PrintCompilation: comp_level
• 0 – none (интерпретатор / native-wrapper)
• 1 – simple (C1-компилятор)
• 2 – limited_profile (С1-компилятор с подсчётом вызовов и итераций циклов)
• 3 – full_profile (уровень 2 плюс профилирование типов)
• 4 – C2-компилятор
36
С1 С2Интерпретатор С2

-XX:+PrintCompilation: status
144 219 % 3 ...::forEachRemaining @ -2 (65 bytes) made not entrant
37
Вход воспрещён

-XX:+PrintCompilation: status
• made zombie – метод не используется,
готов к удалению
38

-XX:+TraceNMethodInstalls
nmethod — JIT-компилированный метод (OSR или обычный)
-XX:+UnlockDiagnosticVMOptions
59 1 3 java.lang.String::hashCode (55 bytes)
Installing method (3) java.lang.String.hashCode()I
59 2 3 java.lang.String::equals (81 bytes)
Installing method (3) java.lang.String.equals(Ljava/lang/Object;)Z
60 3 3 java.lang.String::indexOf (70 bytes)
60 6 n 0 java.lang.System::arraycopy (native) (static)
Installing method (3) java.lang.String.indexOf(II)I
61 4 3 java.lang.Object::<init> (1 bytes)
Installing method (3) java.lang.Object.<init>()V
39

-XX:+PrintCompilation -XX:+TraceNMethodInstalls
#5630: 100000010000000; time= 3.896ms
22939 567 4 Test::sumTwiceStream (21 bytes)
#5631: 100000010000000; time= 4.327ms
#5632: 100000010000000; time= 4.029ms
#5633: 100000010000000; time= 4.170ms
22954 474 3 Test::sumTwiceStream (21 bytes) made not entrant
Installing method (4) Test.sumTwiceStream(I)J
#5634: 100000010000000; time= 4.072ms
22956 568 4 j.u.s.LongPipeline$$...::applyAsLong (6 bytes)
22956 204 3 j.u.s.LongPipeline$$...::applyAsLong (6 bytes) made not entrant
Installing method (4) j.u.s.LongPipeline$$Lambda$2/142257191.applyAsLong(JJ)J
#5635: 100000010000000; time= 58.784ms
40

Почему метод Test.sumTwiceStream
перекомпилировался?
-XX:Tier4InvocationThreshold=5000
-XX:Tier4BackEdgeThreshold=40000
41

Инлайнинг
42
long sum = 0;
for(int i=1; i<=max; i++)
sum+=mult(i);
return sum;
}
static int mult(int x) {
return x*2;
}
long sum = 0;
for(int i=1; i<=max; i++)
sum+=i*2;
return sum;
}
-XX:+PrintInlining

-XX:+PrintInlining
j.u.s.Streams$RangeIntSpliterator::forEachRemaining @ 34 (65 bytes)
@ 44 j.u.s.IntPipeline$5$1::accept (23 bytes) inline (hot)
-> TypeProfile (55084/55084 counts) = j/u/s/IntPipeline$5$1
@ 12 …$$Lambda$1/321001045::applyAsLong (5 bytes) inline (hot)
-> TypeProfile (24272/24272 counts) = Test$$Lambda$1
@ 1 Test::lambda$sumTwiceStream$0 (5 bytes) inline (hot)
@ 17 j.u.s.ReduceOps$8ReducingSink::accept (19 bytes) inline (hot)
-> TypeProfile (24272/24272 counts) = j/u/s/ReduceOps$8ReducingSink
-> TypeProfile (7376/7376 counts) = j/u/s/LongPipeline$$Lambda$2
@ 2 java.lang.Long::sum (4 bytes) inline (hot)
43

-XX:+PrintInlining
44

JITWatch – compile chain
45
https://github.com/AdoptOpenJDK/jitwatch
x -> x*2
Long::sum

Скомпилированный код (до 5600)
LongPipeline::sum
Long::sum
sumTwiceStream C1
C1
C1
C1
C1
C1
C1
C2
46

-XX:+PrintInlining
@ 12 j.u.s.Streams$RangeIntSpliterator::forEachRemaining (65 bytes) inline (hot)
@ 1 Test::lambda$sumTwiceStream$0 (5 bytes) inlining too deep
-> TypeProfile (8593/8593 counts) = java/util/stream/ReduceOps$8ReducingSink
@ 10 …$$Lambda$2/303563356::applyAsLong (6 bytes) inlining too deep
-> TypeProfile (11206/11206 counts) = java/util/stream/LongPipeline$$Lambda$2
47

JITWatch
48
Class: Test
Method: lambda$sumTwiceStream$0
JIT Compiled: Yes
Inlined: No, inlining too deep
Count: 7194
iicount: 5599
Bytes: 5
Prof factor: 1

Скомпилированный код (после 5600)
LongPipeline::sum
Long::sum
sumTwiceStream
C2
C2
C249

java -XX:+UnlockDiagnosticVMOptions
-XX:+PrintFlagsFinal -version
...
uintx MaxGCMinorPauseMillis = 4294967295 {product}
uintx MaxGCPauseMillis = 4294967295 {product}
uintx MaxHeapFreeRatio = 100 {manageable}
uintx MaxHeapSize := 1069547520 {product}
intx MaxInlineLevel = 9 {product}
intx MaxInlineSize = 35 {product}
intx MaxJNILocalCapacity = 65536 {product}
intx MaxJavaStackTraceDepth = 1024 {product}
intx MaxJumpTableSize = 65000 {C2 product}
...
50

JMH; -XX:MaxInlineLevel=13
51

n = 1_000_000; MaxInlineLevel = 13
...
52
0.446 ms/op
0.414 ms/op
1.144 ms/op
5.729 ms/op
5.792 ms/op
5.821 ms/op
5.751 ms/op
5.733 ms/op
...
5.787 ms/op
5.829 ms/op
А было так:

MaxInlineLevel=13
LongPipeline::sum
Long::sum
sumTwiceStream
C2
53

TypeProfile
54

Profile pollution
@Param({"0", "1", "2", "3"})
private int pollute;
@Setup
public void setup() {
switch(pollute) {
case 3:
for(int i=0; i<1000; i++)
IntStream.range(0,100).mapToLong(x -> x*3).sum();
case 2:
for(int i=0; i<1000; i++)
case 1:
for(int i=0; i<1000; i++)
}
}
55

-XX:MaxInlineLevel=13 + pollution
Benchmark (n) (pollute) Score Error Units
MB.stream 100000 0 0.038 ± 0.001 ms/op
MB.stream 100000 1 0.047 ± 0.001 ms/op
MB.stream 100000 2 0.510 ± 0.003 ms/op
MB.stream 100000 3 0.509 ± 0.003 ms/op
MB.stream 1000000 0 0.371 ± 0.002 ms/op
MB.stream 1000000 1 0.435 ± 0.021 ms/op
MB.stream 1000000 2 5.385 ± 0.067 ms/op
MB.stream 1000000 3 5.433 ± 0.042 ms/op
MB.stream 10000000 0 4.029 ± 0.019 ms/op
MB.stream 10000000 1 4.550 ± 0.129 ms/op
MB.stream 10000000 2 53.658 ± 0.812 ms/op
MB.stream 10000000 3 54.563 ± 0.207 ms/op
56

MaxInlineLevel=13 + Type pollution
LongPipeline::sum
Long::sum
sumTwiceStream
C2
C2
57

Баги в OpenJDK
• https://bugs.openjdk.java.net/browse/JDK-8015416
– tier one should collect context-dependent split profiles
• https://bugs.openjdk.java.net/browse/JDK-8015417
– profile pollution after call through invokestatic to shared
code
58

На самом деле
public static long sumTwiceOpt(int max) {
return max*(max+1L);
}
MyBenchmark.opt 100000 0.003 ± 0.001 us/op
59

Но всё не зря!
-Xint
-XX:UseOnStackReplacement
-XX:UseLoopCounter
-XX:MaxInlineLevel
-XX:Tier4InvocationThreshold
-XX:Tier4BackEdgeThreshold
-XX:UnlockDiagnosticVMOptions
-XX:PrintCompilation
-XX:TraceNMethodInstalls
-XX:PrintInlining
-XX:PrintFlagsFinal
60
Опции виртуальной машины
Инструменты
JMH JITWatch

Дополнительная информация
• Алексей Шипилёв «The Black Magic of (Java) Method Dispatch»
– http://shipilev.net/blog/2015/black-magic-method-dispatch/
• Владимир Иванов «Динамическая JIT-компиляция в JVM»
– http://www.youtube.com/watch?v=oYu3HuIYDhI
• Алексей Шипилёв «Java Benchmarking: как два таймстампа
прочитать!»
– http://shipilev.net/blog/2014/nanotrusting-nanotime/
– https://www.youtube.com/watch?v=Vb3jyHl3FNk
• Пол Сандоз отвечает на «Erratic performance of
Arrays.stream().map().sum()»
– http://stackoverflow.com/a/25851390/4856258
61

Всем спасибо!
62
Пони отсюда: http://fire-seeker.deviantart.com/

Joker 2015 - Валеев Тагир - Что же мы измеряем?

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Joker 2015 - Валеев Тагир - Что же мы измеряем?

Ähnlich wie Joker 2015 - Валеев Тагир - Что же мы измеряем? (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Joker 2015 - Валеев Тагир - Что же мы измеряем?

Hinweis der Redaktion