SlideShare ist ein Scribd-Unternehmen logo
1 von 56
Downloaden Sie, um offline zu lesen
“Quantum” Performance Effects
v 2.0; November 2013

Sergey Kuksenko
sergey.kuksenko@oracle.com, @kuksenk0
The following is intended to outline our general product direction. It
is intended for information purposes only, and may not be
incorporated into any contract. It is not a commitment to deliver any
material, code, or functionality, and should not be relied upon in
making purchasing decisions. The development, release, and timing
of any features or functionality described for Oracle’s products
remains at the sole discretion of Oracle.

Slide 2/45.
Intro

Slide 3/45.
Intro: performance engineering

1. Computer Science

→

Software Engineering

Build software to meet functional requirements
Mostly don’t care about HW and data specifics
Abstract and composable, “formal science”

2. Performance Engineering
“Real world strikes back!”
Exploring complex interactions between hardware, software, and data
Based on empirical evidence, i.e. «natural science»

Slide 4/45.
Intro: what’s the difference?

architecture vs microarchitecture

Slide 5/45.
Intro: what’s the difference?

architecture vs microarchitecture

x86
AMD64(x86-64/Intel64)
ARMv7
....

Slide 5/45.

Nehalem
Sandy Bridge
Bulldozer
Bobcat
Cortex-A9
Intro: SUTs1
R
Intel○ Core

TM

i5-520M (Westmere) [↓2.0 GHz] 1x2x2

Ubuntu 10.04 (32-bit)
Xubuntu 12.04 (64-bit)

1 System
Slide 6/45.

Under Test
Intro: SUTs1
R
Intel○ Core

TM

i5-520M (Westmere) [↓2.0 GHz] 1x2x2

Ubuntu 10.04 (32-bit)
Xubuntu 12.04 (64-bit)

R
Intel○ Core

TM

i5-3320M (Ivy Bridge) [↓2.0 GHz] 1x2x2

Ubuntu 13.10 (64-bit)

Samsung Exynos 4412, ARMv7 (Cortex-A9) [1.6 GHz] 1x4x1
Linaro 12.11

1 System
Slide 6/45.

Under Test
Intro: SUTs (cont.)

AMD Opteron

TM

4274HE (Bulldozer/Valencia) [2.5 GHz] 2x8x1

Oracle Linux Server release 6.0 (64-bit)

R
R
Intel○ Xeon○ CPU E5-2680 (Sandy Bridge) [2.70 GHz] 2x8x2
Oracle Linux Server release 6.3 (64-bit)

Slide 7/45.
Intro: JVM

OpenJDK version “1.8.0-ea-lambda” build 83, 32-bits
OpenJDK version “1.8.0-ea-lambda” build 83, 64-bits
OpenJDK version “1.8.0-ea” build 116, 64-bits
Java HotSpot(tm) Embedded “1.8.0-ea-b79”

https://jdk8.java.net/download.html

Slide 8/45.
Intro: Demo code

https://github.com/kuksenko/quantum

Slide 9/45.
Intro: Required

JMH (Java Microbenchmark Harness)

http://openjdk.java.net/projects/code-tools/jmh/
Shipilёv A., “JMH: The Lesser of Two Evils”

http://shipilev.net/pub/#benchmarking

Slide 10/45.
Core

Slide 11/45.
demo1: double sum
private double [] A = new double [2048];
@GenerateMicroBenchmark
public double test1 () {
double sum = 0.0;
for ( int i = 0; i < A. length ; i ++) {
sum += A [i ];
}
return sum ;
}
@GenerateMicroBenchmark
public double testManualUnroll () {
double sum = 0.0;
for ( int i = 0; i < A. length ; i += 4) {
sum += A [i] + A [i + 1] + A[i + 2] + A[i + 3];
}
return sum ;
}
Slide 12/45.
demo1: double sum
private double [] A = new double [2048];
@GenerateMicroBenchmark
public double test1 () {
double sum = 0.0;
for ( int i = 0; i < A. length ; i ++) {
sum += A [i ];
}
return sum ;
}
@GenerateMicroBenchmark
public double testManualUnroll () {
double sum = 0.0;
for ( int i = 0; i < A. length ; i += 4) {
sum += A [i] + A [i + 1] + A[i + 2] + A[i + 3];
}
return sum ;
}
Slide 12/45.

327 ops/msec

699 ops/msec
demo1: looking into asm, test1
loop : addsd
addsd
addsd
addsd
addsd
addsd
addsd
addsd
addsd
addsd
addsd
addsd
addsd
addsd
addsd
addsd
add
cmp
jl
Slide 13/45.

0 x10 (% edi ,% eax ,8) ,% xmm0
0 x18 (% edi ,% eax ,8) ,% xmm0
0 x20 (% edi ,% eax ,8) ,% xmm0
0 x28 (% edi ,% eax ,8) ,% xmm0
0 x30 (% edi ,% eax ,8) ,% xmm0
0 x38 (% edi ,% eax ,8) ,% xmm0
0 x40 (% edi ,% eax ,8) ,% xmm0
0 x48 (% edi ,% eax ,8) ,% xmm0
0 x50 (% edi ,% eax ,8) ,% xmm0
0 x58 (% edi ,% eax ,8) ,% xmm0
0 x60 (% edi ,% eax ,8) ,% xmm0
0 x68 (% edi ,% eax ,8) ,% xmm0
0 x70 (% edi ,% eax ,8) ,% xmm0
0 x78 (% edi ,% eax ,8) ,% xmm0
0 x80 (% edi ,% eax ,8) ,% xmm0
0 x88 (% edi ,% eax ,8) ,% xmm0
$0x10 ,% eax
% ebx ,% eax
loop :
demo1: looking into asm, testManualUnroll
loop : movsd
movsd
movsd
movsd
movsd
movsd
addsd
movsd
movsd
movsd
movsd
movsd
addsd
movsd
addsd
movsd
movsd

Slide 14/45.

% xmm0 ,0 x20 (% esp )
0 x48 (% eax ,% edx ,8) ,% xmm0
% xmm0 ,(% esp )
0 x40 (% eax ,% edx ,8) ,% xmm0
% xmm0 ,0 x8 (% esp )
0 x78 (% eax ,% edx ,8) ,% xmm0
0 x70 (% eax ,% edx ,8) ,% xmm0
0 x80 (% eax ,% edx ,8) ,% xmm1
% xmm1 ,0 x10 (% esp )
0 x88 (% eax ,% edx ,8) ,% xmm1
% xmm1 ,0 x18 (% esp )
0 x38 (% eax ,% edx ,8) ,% xmm4
0 x30 (% eax ,% edx ,8) ,% xmm4
0 x58 (% eax ,% edx ,8) ,% xmm5
0 x50 (% eax ,% edx ,8) ,% xmm5
0 x28 (% eax ,% edx ,8) ,% xmm1
0 x60 (% eax ,% edx ,8) ,% xmm2

movsd
movsd
movsd
addsd
addsd
addsd
addsd
addsd
addsd
addsd
addsd
addsd
addsd
addsd
addsd
addsd
add
cmp
jl

0 x68 (% eax ,% edx ,8) ,% xmm3
0 x20 (% eax ,% edx ,8) ,% xmm7
0 x18 (% eax ,% edx ,8) ,% xmm6
0 x10 (% eax ,% edx ,8) ,% xmm6
0 x10 (% esp ) ,% xmm0
% xmm7 ,% xmm6
0 x18 (% esp ) ,% xmm0
% xmm1 ,% xmm6
% xmm2 ,% xmm5
0 x20 (% esp ) ,% xmm6
% xmm3 ,% xmm5
0 x8 (% esp ) ,% xmm4
(% esp ) ,% xmm4
% xmm4 ,% xmm6
% xmm6 ,% xmm5
% xmm5 ,% xmm0
$0x10 ,% edx
% ebx ,% edx
loop :
demo1: measure time
private double [] A = new double [2048];
@GenerateMicroBenchmark
@BenchmarkMode ( Mode . AverageTime )
@OperationsPerInvocation (2048)
public double test1 () {
double sum = 0.0;
for ( int i = 0; i < A . length ; i ++) {
sum += A [i ];
}
return sum ;
}
@GenerateMicroBenchmark
@BenchmarkMode ( Mode . AverageTime )
@OperationsPerInvocation (2048)
public double testManualUnroll () {
double sum = 0.0;
for ( int i = 0; i < A . length ; i += 4) {
sum += A [i] + A [i + 1] + A[i + 2] + A[i + 3];
}
return sum ;
}
Slide 15/45.
demo1: measure time
private double [] A = new double [2048];
@GenerateMicroBenchmark
@BenchmarkMode ( Mode . AverageTime )
@OperationsPerInvocation (2048)
public double test1 () {
double sum = 0.0;
for ( int i = 0; i < A . length ; i ++) {
sum += A [i ];
}
return sum ;
}
@GenerateMicroBenchmark
@BenchmarkMode ( Mode . AverageTime )
@OperationsPerInvocation (2048)
public double testManualUnroll () {
double sum = 0.0;
for ( int i = 0; i < A . length ; i += 4) {
sum += A [i] + A [i + 1] + A[i + 2] + A[i + 3];
}
return sum ;
}
Slide 15/45.

1.5 nsec/op

0.7 nsec/op
demo1: measure time
private double [] A = new double [2048];
@GenerateMicroBenchmark
@BenchmarkMode ( Mode . AverageTime )
@OperationsPerInvocation (2048)
public double test1 () {
double sum = 0.0;
for ( int i = 0; i < A . length ; i ++) {
sum += A [i ];
}
return sum ;
}
@GenerateMicroBenchmark
@BenchmarkMode ( Mode . AverageTime )
@OperationsPerInvocation (2048)
public double testManualUnroll () {
double sum = 0.0;
for ( int i = 0; i < A . length ; i += 4) {
sum += A [i] + A [i + 1] + A[i + 2] + A[i + 3];
}
return sum ;
}
Slide 15/45.

1.5 nsec/op
CPI =

∌2.5

0.7 nsec/op
CPI =

∌0.6
𝜇Arch:

x86

CISC vs RISC

Slide 16/45.
𝜇Arch:

x86

CISC and RISC
modern x86 CPU is not what it seems

All instructions (CISC) are dynamically translated into RISC-like
microoperations ( 𝜇ops).

Slide 16/45.
𝜇Arch:

Slide 17/45.

Nehalem internals
𝜇Arch:

Slide 18/45.

simplified scheme
𝜇Arch:

looking into instruction tables

Instrustion

1
𝑇 ℎ𝑟𝑜𝑱𝑔ℎ𝑝𝑱𝑡

ADDSP r,r

3

1

MULSD r,r

5

1

ADD/SUB r,r

1

0.33

MUL/IMUL r,r

Slide 19/45.

Latency

3

1
demo1: test1, looking into asm again
loop : addsd
addsd
addsd
addsd
addsd
addsd
addsd
addsd
addsd
addsd
addsd
addsd
addsd
addsd
addsd
addsd
add
cmp
jl
Slide 20/45.

0 x10 (% edi ,% eax ,8) ,% xmm0
0 x18 (% edi ,% eax ,8) ,% xmm0
0 x20 (% edi ,% eax ,8) ,% xmm0
0 x28 (% edi ,% eax ,8) ,% xmm0
0 x30 (% edi ,% eax ,8) ,% xmm0
0 x38 (% edi ,% eax ,8) ,% xmm0
0 x40 (% edi ,% eax ,8) ,% xmm0
0 x48 (% edi ,% eax ,8) ,% xmm0
0 x50 (% edi ,% eax ,8) ,% xmm0
0 x58 (% edi ,% eax ,8) ,% xmm0
0 x60 (% edi ,% eax ,8) ,% xmm0
0 x68 (% edi ,% eax ,8) ,% xmm0
0 x70 (% edi ,% eax ,8) ,% xmm0
0 x78 (% edi ,% eax ,8) ,% xmm0
0 x80 (% edi ,% eax ,8) ,% xmm0
0 x88 (% edi ,% eax ,8) ,% xmm0
$0x10 ,% eax
% ebx ,% eax
loop :

1.5 nsec/op
∌ 3 clk/op
unroll by 16
19 insts
CPI ∌ 2.5
demo1: test1, other view

1.5 nsec/op
∌ 3 clk/op
unroll by 16
19 insts
CPI ∌ 2.5
Slide 21/45.
demo1: testManualUnroll

Slide 22/45.
demo1: testManualUnroll, other view

0.7 nsec/op
∌ 1.4 clk/op
unroll by 4x4
36 insts
CPI ∌ 0.6
Slide 23/45.
𝜇Arch:

Dependencies
1

Performance ILP

of many programs is limited by natural data

dependencies.

1 Instruction
Slide 24/45.

Level Parallelism
𝜇Arch:

Dependencies
1

Performance ILP

of many programs is limited by natural data

dependencies.

What to do?

Break the Dependency Chains!

1 Instruction
Slide 24/45.

Level Parallelism
demo1 back: breaking chains in a “right” way

Slide 25/45.
demo1 back: breaking chains in a “right” way
...
for ( int i = 0; i < A . length ; i ++) {
sum += A [i ];
}
return sum ;

Slide 25/45.
demo1 back: breaking chains in a “right” way
...
for ( int i = 0; i < A . length ; i ++) {
sum += A [i ];
}
return sum ;
...
for ( int i = 0; i < A . length ; i += 2) {
sum0 += A [i ];
sum1 += A [i + 1];
}
return sum0 + sum1 ;

Slide 25/45.
demo1 back: breaking chains in a “right” way
...
for ( int i = 0; i < A . length ; i ++) {
sum += A [i ];
}
return sum ;
...
for ( int i = 0; i < A . length ; i += 2) {
sum0 += A [i ];
sum1 += A [i + 1];
}
return sum0 + sum1 ;
...
for ( int i = 0; i < array . length ; i += 4) {
sum0 += A [i ];
sum1 += A [i + 1];
sum2 += A [i + 2];
sum3 += A [i + 3];
}
return ( sum0 + sum1 ) + ( sum2 + sum3 );
Slide 25/45.
demo1 back: double sum final results

Nehalem

AMD

ARM

testManualUnroll

0.70

0.45

3.31

test1

1.49

1.50

6.60

test2

0.75

0.79

4.25

test4

0.51

0.43

4.25

test8

0.51

0.25

2.55

time, nsec/op

Slide 26/45.
demo2: results
Nehalem

AMD

ARM

DoubleMul.test1

3.89

2.52

8.17

DoubleMul.test2

3.59

2.37

4.25

DoubleMul.test4

0.73

0.49

3.15

DoubleMul.test8

0.61

0.30

2.53

IntMul.test1

1.49

1.16

10.04

IntMul.test2

0.75

0.75

7.38

IntMul.test4

0.57

0.67

4.64

IntSum.test1

0.51

0.32

8.92

IntSum.test2

0.51

0.48

6.12

time, nsec/op
Slide 27/45.
Branches: to jump or not to jump

public int absSumBranch ( int a []) {
int sum = 0;
for ( int x : a) {
if (x < 0) {
sum -= x ;
} else {
sum += x ;
}
}
return sum ;
}

Slide 28/45.

loop :

L1 :
L2 :

mov
test
jl
add
jmp
sub
inc
cmp
jl

0 xc (% ecx ,% ebp ,4) ,% ebx
% ebx ,% ebx
L1
% ebx ,% eax
L2
% ebx ,% eax
% ebp
% edx ,% ebp
loop
Branches: to jump or not to jump

public int absSumPredicated ( int a []) {
int sum = 0;
for ( int x : a) {
sum += Math . abs ( x );
}
return sum ;
}

Slide 29/45.

loop :

mov
mov
neg
test
cmovl
add
inc
cmp
jl

0 xc (% ecx ,% ebp ,4) ,% ebx
% ebx ,% esi
% esi
% ebx ,% ebx
% esi ,% ebx
% ebx ,% eax
% ebp
% edx ,% ebp
Loop
demo3: results
Regular Pattern = (+, –)*
Nehalem

Ivy Bridge

Bulldozer

Cortex-A9

branch_regular

0.88

0.88

0.82

5.02

branch_shuffled

6.42

1.29

2.84

9.44

branch_sorted

0.90

0.95

0.99

5.02

predicated_regular

1.33

1.16

0.92

5.33

predicated_shuffled

1.33

1.16

0.96

9.3

predicated_sorted

1.33

1.16

0.96

5.65

time, nsec/op

Slide 30/45.
demo3: results
Regular Pattern = (+, +, –, +, –, –, +, –, –, +)*
Nehalem

Ivy Bridge

Bulldozer

Cortex-A9

branch_regular

1.55

1.14

0.98

5.02

branch_shuffled

6.38

1.33

2.33

9.53

branch_sorted

0.90

0.95

0.95

5.03

predicated_regular

1.33

1.16

0.95

5.33

predicated_shuffled

1.33

1.16

0.94

9.38

predicated_sorted

1.33

1.16

0.91

5.65

time, nsec/op

Slide 31/45.
demo4: && vs &
public int countConditional ( boolean [] f0 , boolean [] f1 ) {
int cnt = 0;
for ( int j = 0; j < SIZE ; j ++) {
for ( int i = 0; i < SIZE ; i ++) {
if ( f0 [i] && f1 [j ]) {
cnt ++;
}
}
shuffled
}
sorted
return cnt ;
}

Slide 32/45.

&&

4.4 nsec/op
1.5 nsec/op
demo4: && vs &
public int countLogical ( boolean [] f0 , boolean [] f1 ) {
int cnt = 0;
for ( int j = 0; j < SIZE ; j ++) {
shuffled
for ( int i = 0; i < SIZE ; i ++) {
if ( f0 [i] & f1 [j ]) {
sorted
cnt ++;
}
}
}
shuffled
return cnt ;
}
sorted

Slide 33/45.

&&

4.4 nsec/op
1.5 nsec/op

&

2.0 nsec/op
2.0 nsec/op
demo5: interface invocation cost
public interface I
{ public
...
public class C0 implements I { public
public class C1 implements I { public
public class C2 implements I { public
public class C3 implements I { public
...
@GenerateMicroBenchmark
@BenchmarkMode ( Mode . AverageTime )
@OperationsPerInvocation ( SIZE )
public int sum (I [] a ) {
int s = 0;
for (I i : a) {
s += i . amount ();
}
return s;
}

Slide 34/45.

int amount (); }
int
int
int
int

amount (){
amount (){
amount (){
amount (){

return
return
return
return

0;
1;
2;
3;

}
}
}
}

}
}
}
}
demo5: results

1 target

2 targets

3 targets

4 targets

1.0

1.1

7.7

7.8

regular

1.0

7.7

19.0

shuffled

7.4

22.7

24.8

sorted

time, nsec/op

Slide 35/45.
Not-a-Core

Slide 36/45.
Not-a-Core: HW Multithreading

Simultaneous multithreading, SMT

R
e.g. Intel○ Hyper-Threading Technology

Fine-grained temporal multithreading
e.g. CMT, Sun/Oracle ULTRASparc T1, T2, T3, T4, T5 ...

Slide 37/45.
back to demo1: Execution Units Saturation

1 thread

2 threads

4 threads

DoubleSum.test1

327

654

1279

DoubleSum.test2

647

1293

1865

DoubleSum.test4

957

1916

1866

DoubleSum.testManualUnroll

699

1398

1432

overall throughput, ops/msec

Slide 38/45.
demo6: show

All eyes on the the screen!

Slide 39/45.
demo6: show

All eyes on the the screen!

Slide 39/45.
demo6: HDivs.heavy* results on Nehalem
1 thread
int

180

double

90

throughput, ops/ 𝜇sec
2 threads
-cpu 1,3

(double, int)

(90, 90)

(90, 90)

(90, 90)

(45, 45)

(45, 45)

(90, 180)

(double, double)

-cpu 3

(180, 180)

(int, int)

-cpu 2,3

(81, 18)

(90, 45)

throughput, ops/ 𝜇sec

Slide 40/45.
demo6: HDivs.heavy* results on AMD
1 thread
int

128

double

306

throughput, ops/ 𝜇sec
2 threads
-cpu 0,1

-cpu 0,2

-cpu 0,8

-cpu 0

(92, 92)

(127, 127)

(128, 132)

(63, 63)

(double, double)

(151, 153)

(304, 304)

(313, 314)

(154, 155)

(double, int)

(278, 119)

(290, 127)

(313, 129)

(122, 64)

(int, int)

throughput, ops/ 𝜇sec

Slide 41/45.
Conclusion

Slide 42/45.
Enlarge your knowledge with these
simple tricks!

Reading list:
“Computer Architecture: A Quantitative Approach”
John L. Hennessy, David A. Patterson

http://www.agner.org/optimize/

Slide 43/45.
Thanks!

Slide 44/45.
Q & A

Slide 45/45.

Weitere Àhnliche Inhalte

Was ist angesagt?

Eric Lafortune - The Jack and Jill build system
Eric Lafortune - The Jack and Jill build systemEric Lafortune - The Jack and Jill build system
Eric Lafortune - The Jack and Jill build systemGuardSquare
 
Production Time Profiling and Diagnostics on the JVM
Production Time Profiling and Diagnostics on the JVMProduction Time Profiling and Diagnostics on the JVM
Production Time Profiling and Diagnostics on the JVMMarcus Hirt
 
Exploring OpenFaaS autoscalability on Kubernetes with the Chaos Toolkit
Exploring OpenFaaS autoscalability on Kubernetes with the Chaos ToolkitExploring OpenFaaS autoscalability on Kubernetes with the Chaos Toolkit
Exploring OpenFaaS autoscalability on Kubernetes with the Chaos ToolkitSylvain Hellegouarch
 
First fare 2010 java-beta-2011
First fare 2010 java-beta-2011First fare 2010 java-beta-2011
First fare 2010 java-beta-2011Oregon FIRST Robotics
 
Eric Lafortune - ProGuard: Optimizer and obfuscator in the Android SDK
Eric Lafortune - ProGuard: Optimizer and obfuscator in the Android SDKEric Lafortune - ProGuard: Optimizer and obfuscator in the Android SDK
Eric Lafortune - ProGuard: Optimizer and obfuscator in the Android SDKGuardSquare
 
Kernel Recipes 2013 - Automating source code evolutions using Coccinelle
Kernel Recipes 2013 - Automating source code evolutions using CoccinelleKernel Recipes 2013 - Automating source code evolutions using Coccinelle
Kernel Recipes 2013 - Automating source code evolutions using CoccinelleAnne Nicolas
 
ProGuard / DexGuard Tips and Tricks
ProGuard / DexGuard Tips and TricksProGuard / DexGuard Tips and Tricks
ProGuard / DexGuard Tips and Tricksnetomi
 
20181106 arie van_deursen_testday2018
20181106 arie van_deursen_testday201820181106 arie van_deursen_testday2018
20181106 arie van_deursen_testday2018STAMP Project
 
Kotlin+MicroProfile: Enseñando trucos de 20 años a un nuevo lenguaje
Kotlin+MicroProfile: Enseñando trucos de 20 años a un nuevo lenguajeKotlin+MicroProfile: Enseñando trucos de 20 años a un nuevo lenguaje
Kotlin+MicroProfile: Enseñando trucos de 20 años a un nuevo lenguajeVíctor Leonel Orozco López
 
Eric Lafortune - The Jack and Jill build system
Eric Lafortune - The Jack and Jill build systemEric Lafortune - The Jack and Jill build system
Eric Lafortune - The Jack and Jill build systemGuardSquare
 
Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams
Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streamsTech talks annual 2015 kirk pepperdine_ripping apart java 8 streams
Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streamsTechTalks
 
Non-blocking Michael-Scott queue algorithm
Non-blocking Michael-Scott queue algorithmNon-blocking Michael-Scott queue algorithm
Non-blocking Michael-Scott queue algorithmAlexey Fyodorov
 
Real-time Computer Vision With Ruby - OSCON 2008
Real-time Computer Vision With Ruby - OSCON 2008Real-time Computer Vision With Ruby - OSCON 2008
Real-time Computer Vision With Ruby - OSCON 2008Jan Wedekind
 
Improving Android Performance at Mobiconf 2014
Improving Android Performance at Mobiconf 2014Improving Android Performance at Mobiconf 2014
Improving Android Performance at Mobiconf 2014Raimon RĂ fols
 
Eric Lafortune - ProGuard and DexGuard for optimization and protection
Eric Lafortune - ProGuard and DexGuard for optimization and protectionEric Lafortune - ProGuard and DexGuard for optimization and protection
Eric Lafortune - ProGuard and DexGuard for optimization and protectionGuardSquare
 
Jdk 7 4-forkjoin
Jdk 7 4-forkjoinJdk 7 4-forkjoin
Jdk 7 4-forkjoinknight1128
 
Eric Lafortune - ProGuard: Optimizer and obfuscator in the Android SDK
Eric Lafortune - ProGuard: Optimizer and obfuscator in the Android SDKEric Lafortune - ProGuard: Optimizer and obfuscator in the Android SDK
Eric Lafortune - ProGuard: Optimizer and obfuscator in the Android SDKGuardSquare
 
ITT 2014 - Eric Lafortune - ProGuard, Optimizer and Obfuscator in the Android...
ITT 2014 - Eric Lafortune - ProGuard, Optimizer and Obfuscator in the Android...ITT 2014 - Eric Lafortune - ProGuard, Optimizer and Obfuscator in the Android...
ITT 2014 - Eric Lafortune - ProGuard, Optimizer and Obfuscator in the Android...Istanbul Tech Talks
 
JEEConf 2017 - How to find deadlock not getting into it
JEEConf 2017 - How to find deadlock not getting into itJEEConf 2017 - How to find deadlock not getting into it
JEEConf 2017 - How to find deadlock not getting into itNikita Koval
 

Was ist angesagt? (20)

Eric Lafortune - The Jack and Jill build system
Eric Lafortune - The Jack and Jill build systemEric Lafortune - The Jack and Jill build system
Eric Lafortune - The Jack and Jill build system
 
Production Time Profiling and Diagnostics on the JVM
Production Time Profiling and Diagnostics on the JVMProduction Time Profiling and Diagnostics on the JVM
Production Time Profiling and Diagnostics on the JVM
 
Exploring OpenFaaS autoscalability on Kubernetes with the Chaos Toolkit
Exploring OpenFaaS autoscalability on Kubernetes with the Chaos ToolkitExploring OpenFaaS autoscalability on Kubernetes with the Chaos Toolkit
Exploring OpenFaaS autoscalability on Kubernetes with the Chaos Toolkit
 
First fare 2010 java-beta-2011
First fare 2010 java-beta-2011First fare 2010 java-beta-2011
First fare 2010 java-beta-2011
 
Eric Lafortune - ProGuard: Optimizer and obfuscator in the Android SDK
Eric Lafortune - ProGuard: Optimizer and obfuscator in the Android SDKEric Lafortune - ProGuard: Optimizer and obfuscator in the Android SDK
Eric Lafortune - ProGuard: Optimizer and obfuscator in the Android SDK
 
Kernel Recipes 2013 - Automating source code evolutions using Coccinelle
Kernel Recipes 2013 - Automating source code evolutions using CoccinelleKernel Recipes 2013 - Automating source code evolutions using Coccinelle
Kernel Recipes 2013 - Automating source code evolutions using Coccinelle
 
ProGuard / DexGuard Tips and Tricks
ProGuard / DexGuard Tips and TricksProGuard / DexGuard Tips and Tricks
ProGuard / DexGuard Tips and Tricks
 
20181106 arie van_deursen_testday2018
20181106 arie van_deursen_testday201820181106 arie van_deursen_testday2018
20181106 arie van_deursen_testday2018
 
Kotlin+MicroProfile: Enseñando trucos de 20 años a un nuevo lenguaje
Kotlin+MicroProfile: Enseñando trucos de 20 años a un nuevo lenguajeKotlin+MicroProfile: Enseñando trucos de 20 años a un nuevo lenguaje
Kotlin+MicroProfile: Enseñando trucos de 20 años a un nuevo lenguaje
 
Eric Lafortune - The Jack and Jill build system
Eric Lafortune - The Jack and Jill build systemEric Lafortune - The Jack and Jill build system
Eric Lafortune - The Jack and Jill build system
 
Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams
Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streamsTech talks annual 2015 kirk pepperdine_ripping apart java 8 streams
Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams
 
Completable future
Completable futureCompletable future
Completable future
 
Non-blocking Michael-Scott queue algorithm
Non-blocking Michael-Scott queue algorithmNon-blocking Michael-Scott queue algorithm
Non-blocking Michael-Scott queue algorithm
 
Real-time Computer Vision With Ruby - OSCON 2008
Real-time Computer Vision With Ruby - OSCON 2008Real-time Computer Vision With Ruby - OSCON 2008
Real-time Computer Vision With Ruby - OSCON 2008
 
Improving Android Performance at Mobiconf 2014
Improving Android Performance at Mobiconf 2014Improving Android Performance at Mobiconf 2014
Improving Android Performance at Mobiconf 2014
 
Eric Lafortune - ProGuard and DexGuard for optimization and protection
Eric Lafortune - ProGuard and DexGuard for optimization and protectionEric Lafortune - ProGuard and DexGuard for optimization and protection
Eric Lafortune - ProGuard and DexGuard for optimization and protection
 
Jdk 7 4-forkjoin
Jdk 7 4-forkjoinJdk 7 4-forkjoin
Jdk 7 4-forkjoin
 
Eric Lafortune - ProGuard: Optimizer and obfuscator in the Android SDK
Eric Lafortune - ProGuard: Optimizer and obfuscator in the Android SDKEric Lafortune - ProGuard: Optimizer and obfuscator in the Android SDK
Eric Lafortune - ProGuard: Optimizer and obfuscator in the Android SDK
 
ITT 2014 - Eric Lafortune - ProGuard, Optimizer and Obfuscator in the Android...
ITT 2014 - Eric Lafortune - ProGuard, Optimizer and Obfuscator in the Android...ITT 2014 - Eric Lafortune - ProGuard, Optimizer and Obfuscator in the Android...
ITT 2014 - Eric Lafortune - ProGuard, Optimizer and Obfuscator in the Android...
 
JEEConf 2017 - How to find deadlock not getting into it
JEEConf 2017 - How to find deadlock not getting into itJEEConf 2017 - How to find deadlock not getting into it
JEEConf 2017 - How to find deadlock not getting into it
 

Andere mochten auch

Đ–Đ”Đ»Đ”Đ·ĐœŃ‹Đ” счётчоĐșĐž ĐœĐ° стражД ĐżŃ€ĐŸĐžĐ·ĐČĐŸĐŽĐžŃ‚Đ”Đ»ŃŒĐœĐŸŃŃ‚Đž
Đ–Đ”Đ»Đ”Đ·ĐœŃ‹Đ” счётчоĐșĐž ĐœĐ° стражД ĐżŃ€ĐŸĐžĐ·ĐČĐŸĐŽĐžŃ‚Đ”Đ»ŃŒĐœĐŸŃŃ‚ĐžĐ–Đ”Đ»Đ”Đ·ĐœŃ‹Đ” счётчоĐșĐž ĐœĐ° стражД ĐżŃ€ĐŸĐžĐ·ĐČĐŸĐŽĐžŃ‚Đ”Đ»ŃŒĐœĐŸŃŃ‚Đž
Đ–Đ”Đ»Đ”Đ·ĐœŃ‹Đ” счётчоĐșĐž ĐœĐ° стражД ĐżŃ€ĐŸĐžĐ·ĐČĐŸĐŽĐžŃ‚Đ”Đ»ŃŒĐœĐŸŃŃ‚ĐžSergey Kuksenko
 
JPoint 2016 - ВалДДĐČ ĐąĐ°ĐłĐžŃ€ - ĐĄŃ‚Ń€Đ°ĐœĐœĐŸŃŃ‚Đž Stream API
JPoint 2016 - ВалДДĐČ ĐąĐ°ĐłĐžŃ€ - ĐĄŃ‚Ń€Đ°ĐœĐœĐŸŃŃ‚Đž Stream APIJPoint 2016 - ВалДДĐČ ĐąĐ°ĐłĐžŃ€ - ĐĄŃ‚Ń€Đ°ĐœĐœĐŸŃŃ‚Đž Stream API
JPoint 2016 - ВалДДĐČ ĐąĐ°ĐłĐžŃ€ - ĐĄŃ‚Ń€Đ°ĐœĐœĐŸŃŃ‚Đž Stream APItvaleev
 
Stream API: рДĐșĐŸĐŒĐ”ĐœĐŽĐ°Ń†ĐžĐž Đ»ŃƒŃ‡ŃˆĐžŃ… ŃĐŸĐ±Đ°ĐșĐŸĐČĐŸĐŽĐŸĐČ
Stream API: рДĐșĐŸĐŒĐ”ĐœĐŽĐ°Ń†ĐžĐž Đ»ŃƒŃ‡ŃˆĐžŃ… ŃĐŸĐ±Đ°ĐșĐŸĐČĐŸĐŽĐŸĐČStream API: рДĐșĐŸĐŒĐ”ĐœĐŽĐ°Ń†ĐžĐž Đ»ŃƒŃ‡ŃˆĐžŃ… ŃĐŸĐ±Đ°ĐșĐŸĐČĐŸĐŽĐŸĐČ
Stream API: рДĐșĐŸĐŒĐ”ĐœĐŽĐ°Ń†ĐžĐž Đ»ŃƒŃ‡ŃˆĐžŃ… ŃĐŸĐ±Đ°ĐșĐŸĐČĐŸĐŽĐŸĐČtvaleev
 
Javaland keynote final
Javaland keynote finalJavaland keynote final
Javaland keynote finalMarcus Lagergren
 
Open Access, Open Science, and Open Research
Open Access, Open Science, and Open ResearchOpen Access, Open Science, and Open Research
Open Access, Open Science, and Open ResearchKshitiz Khanal
 
Java 8 Puzzlers as it was presented at Codemash 2017
Java 8 Puzzlers as it was presented at Codemash 2017Java 8 Puzzlers as it was presented at Codemash 2017
Java 8 Puzzlers as it was presented at Codemash 2017Baruch Sadogursky
 
6 Common M.O.T.I.V.E.S of Salespeople
6 Common M.O.T.I.V.E.S of Salespeople6 Common M.O.T.I.V.E.S of Salespeople
6 Common M.O.T.I.V.E.S of SalespeopleSales Readiness Group
 
Balancing Discovery and Continuation in Recommendations
Balancing Discovery and Continuation in RecommendationsBalancing Discovery and Continuation in Recommendations
Balancing Discovery and Continuation in RecommendationsMohammad Hossein Taghavi
 
Kamailio World 2016: Update your SIP!
Kamailio World 2016: Update your SIP!Kamailio World 2016: Update your SIP!
Kamailio World 2016: Update your SIP!Olle E Johansson
 
SharePoint Permissions Worst Practices
SharePoint Permissions Worst PracticesSharePoint Permissions Worst Practices
SharePoint Permissions Worst PracticesBobby Chang
 
Le rĂŽle du manager
Le rĂŽle du managerLe rĂŽle du manager
Le rĂŽle du managerkeyros
 
Com es distribueix la corrupciĂł polĂ­tica a Espanya
Com es distribueix la corrupciĂł polĂ­tica a EspanyaCom es distribueix la corrupciĂł polĂ­tica a Espanya
Com es distribueix la corrupciĂł polĂ­tica a EspanyaRamir De Porrata-Doria
 
Tracxn Research - Insurance Tech Landscape, February 2017
Tracxn Research - Insurance Tech Landscape, February 2017Tracxn Research - Insurance Tech Landscape, February 2017
Tracxn Research - Insurance Tech Landscape, February 2017Tracxn
 
Revenu universel 2h pour tout comprendre
Revenu universel 2h pour tout comprendreRevenu universel 2h pour tout comprendre
Revenu universel 2h pour tout comprendreLes Ă©coloHumanistes
 
Collagen and collagen disorders
Collagen and collagen disordersCollagen and collagen disorders
Collagen and collagen disordersAchi Joshi
 
Europe ai scaleups report 2016
Europe ai scaleups report 2016Europe ai scaleups report 2016
Europe ai scaleups report 2016Omar Mohout
 
Resilient Architecture
Resilient ArchitectureResilient Architecture
Resilient ArchitectureMatt Stine
 
The Next Generation of AI and Deep Learning - GTC17
The Next Generation of AI and Deep Learning - GTC17The Next Generation of AI and Deep Learning - GTC17
The Next Generation of AI and Deep Learning - GTC17NVIDIA
 
A-Z Culture Glossary 2017
A-Z Culture Glossary 2017A-Z Culture Glossary 2017
A-Z Culture Glossary 2017sparks & honey
 

Andere mochten auch (20)

Đ–Đ”Đ»Đ”Đ·ĐœŃ‹Đ” счётчоĐșĐž ĐœĐ° стражД ĐżŃ€ĐŸĐžĐ·ĐČĐŸĐŽĐžŃ‚Đ”Đ»ŃŒĐœĐŸŃŃ‚Đž
Đ–Đ”Đ»Đ”Đ·ĐœŃ‹Đ” счётчоĐșĐž ĐœĐ° стражД ĐżŃ€ĐŸĐžĐ·ĐČĐŸĐŽĐžŃ‚Đ”Đ»ŃŒĐœĐŸŃŃ‚ĐžĐ–Đ”Đ»Đ”Đ·ĐœŃ‹Đ” счётчоĐșĐž ĐœĐ° стражД ĐżŃ€ĐŸĐžĐ·ĐČĐŸĐŽĐžŃ‚Đ”Đ»ŃŒĐœĐŸŃŃ‚Đž
Đ–Đ”Đ»Đ”Đ·ĐœŃ‹Đ” счётчоĐșĐž ĐœĐ° стражД ĐżŃ€ĐŸĐžĐ·ĐČĐŸĐŽĐžŃ‚Đ”Đ»ŃŒĐœĐŸŃŃ‚Đž
 
JPoint 2016 - ВалДДĐČ ĐąĐ°ĐłĐžŃ€ - ĐĄŃ‚Ń€Đ°ĐœĐœĐŸŃŃ‚Đž Stream API
JPoint 2016 - ВалДДĐČ ĐąĐ°ĐłĐžŃ€ - ĐĄŃ‚Ń€Đ°ĐœĐœĐŸŃŃ‚Đž Stream APIJPoint 2016 - ВалДДĐČ ĐąĐ°ĐłĐžŃ€ - ĐĄŃ‚Ń€Đ°ĐœĐœĐŸŃŃ‚Đž Stream API
JPoint 2016 - ВалДДĐČ ĐąĐ°ĐłĐžŃ€ - ĐĄŃ‚Ń€Đ°ĐœĐœĐŸŃŃ‚Đž Stream API
 
Stream API: рДĐșĐŸĐŒĐ”ĐœĐŽĐ°Ń†ĐžĐž Đ»ŃƒŃ‡ŃˆĐžŃ… ŃĐŸĐ±Đ°ĐșĐŸĐČĐŸĐŽĐŸĐČ
Stream API: рДĐșĐŸĐŒĐ”ĐœĐŽĐ°Ń†ĐžĐž Đ»ŃƒŃ‡ŃˆĐžŃ… ŃĐŸĐ±Đ°ĐșĐŸĐČĐŸĐŽĐŸĐČStream API: рДĐșĐŸĐŒĐ”ĐœĐŽĐ°Ń†ĐžĐž Đ»ŃƒŃ‡ŃˆĐžŃ… ŃĐŸĐ±Đ°ĐșĐŸĐČĐŸĐŽĐŸĐČ
Stream API: рДĐșĐŸĐŒĐ”ĐœĐŽĐ°Ń†ĐžĐž Đ»ŃƒŃ‡ŃˆĐžŃ… ŃĐŸĐ±Đ°ĐșĐŸĐČĐŸĐŽĐŸĐČ
 
Javaland keynote final
Javaland keynote finalJavaland keynote final
Javaland keynote final
 
Open Access, Open Science, and Open Research
Open Access, Open Science, and Open ResearchOpen Access, Open Science, and Open Research
Open Access, Open Science, and Open Research
 
Java 8 Puzzlers as it was presented at Codemash 2017
Java 8 Puzzlers as it was presented at Codemash 2017Java 8 Puzzlers as it was presented at Codemash 2017
Java 8 Puzzlers as it was presented at Codemash 2017
 
6 Common M.O.T.I.V.E.S of Salespeople
6 Common M.O.T.I.V.E.S of Salespeople6 Common M.O.T.I.V.E.S of Salespeople
6 Common M.O.T.I.V.E.S of Salespeople
 
Balancing Discovery and Continuation in Recommendations
Balancing Discovery and Continuation in RecommendationsBalancing Discovery and Continuation in Recommendations
Balancing Discovery and Continuation in Recommendations
 
Kamailio World 2016: Update your SIP!
Kamailio World 2016: Update your SIP!Kamailio World 2016: Update your SIP!
Kamailio World 2016: Update your SIP!
 
SharePoint Permissions Worst Practices
SharePoint Permissions Worst PracticesSharePoint Permissions Worst Practices
SharePoint Permissions Worst Practices
 
Le rĂŽle du manager
Le rĂŽle du managerLe rĂŽle du manager
Le rĂŽle du manager
 
Com es distribueix la corrupciĂł polĂ­tica a Espanya
Com es distribueix la corrupciĂł polĂ­tica a EspanyaCom es distribueix la corrupciĂł polĂ­tica a Espanya
Com es distribueix la corrupciĂł polĂ­tica a Espanya
 
Tracxn Research - Insurance Tech Landscape, February 2017
Tracxn Research - Insurance Tech Landscape, February 2017Tracxn Research - Insurance Tech Landscape, February 2017
Tracxn Research - Insurance Tech Landscape, February 2017
 
Revenu universel 2h pour tout comprendre
Revenu universel 2h pour tout comprendreRevenu universel 2h pour tout comprendre
Revenu universel 2h pour tout comprendre
 
Social Media Survival Guide
Social Media Survival GuideSocial Media Survival Guide
Social Media Survival Guide
 
Collagen and collagen disorders
Collagen and collagen disordersCollagen and collagen disorders
Collagen and collagen disorders
 
Europe ai scaleups report 2016
Europe ai scaleups report 2016Europe ai scaleups report 2016
Europe ai scaleups report 2016
 
Resilient Architecture
Resilient ArchitectureResilient Architecture
Resilient Architecture
 
The Next Generation of AI and Deep Learning - GTC17
The Next Generation of AI and Deep Learning - GTC17The Next Generation of AI and Deep Learning - GTC17
The Next Generation of AI and Deep Learning - GTC17
 
A-Z Culture Glossary 2017
A-Z Culture Glossary 2017A-Z Culture Glossary 2017
A-Z Culture Glossary 2017
 

Ähnlich wie "Quantum" performance effects

How to add an optimization for C# to RyuJIT
How to add an optimization for C# to RyuJITHow to add an optimization for C# to RyuJIT
How to add an optimization for C# to RyuJITEgor Bogatov
 
Navigating the xDD Alphabet Soup
Navigating the xDD Alphabet SoupNavigating the xDD Alphabet Soup
Navigating the xDD Alphabet SoupDror Helper
 
Checking the code of Valgrind dynamic analyzer by a static analyzer
Checking the code of Valgrind dynamic analyzer by a static analyzerChecking the code of Valgrind dynamic analyzer by a static analyzer
Checking the code of Valgrind dynamic analyzer by a static analyzerPVS-Studio
 
The Ring programming language version 1.9 book - Part 99 of 210
The Ring programming language version 1.9 book - Part 99 of 210The Ring programming language version 1.9 book - Part 99 of 210
The Ring programming language version 1.9 book - Part 99 of 210Mahmoud Samir Fayed
 
Tools and Techniques for Understanding Threading Behavior in Android*
Tools and Techniques for Understanding Threading Behavior in Android*Tools and Techniques for Understanding Threading Behavior in Android*
Tools and Techniques for Understanding Threading Behavior in Android*IntelÂź Software
 
A Unicorn Seeking Extraterrestrial Life: Analyzing SETI@home's Source Code
A Unicorn Seeking Extraterrestrial Life: Analyzing SETI@home's Source CodeA Unicorn Seeking Extraterrestrial Life: Analyzing SETI@home's Source Code
A Unicorn Seeking Extraterrestrial Life: Analyzing SETI@home's Source CodePVS-Studio
 
COSCUP: Introduction to Julia
COSCUP: Introduction to JuliaCOSCUP: Introduction to Julia
COSCUP: Introduction to JuliaćČłèŻ 杜
 
00_Introduction to Java.ppt
00_Introduction to Java.ppt00_Introduction to Java.ppt
00_Introduction to Java.pptHongAnhNguyn285885
 
Open Cv 2005 Q4 Tutorial
Open Cv 2005 Q4 TutorialOpen Cv 2005 Q4 Tutorial
Open Cv 2005 Q4 Tutorialantiw
 
Threaded Programming
Threaded ProgrammingThreaded Programming
Threaded ProgrammingSri Prasanna
 
Intel IPP Samples for Windows - error correction
Intel IPP Samples for Windows - error correctionIntel IPP Samples for Windows - error correction
Intel IPP Samples for Windows - error correctionAndrey Karpov
 
Intel IPP Samples for Windows - error correction
Intel IPP Samples for Windows - error correctionIntel IPP Samples for Windows - error correction
Intel IPP Samples for Windows - error correctionPVS-Studio
 
Tricks to Making a Realtime SurfaceView Actually Perform in Realtime - Maarte...
Tricks to Making a Realtime SurfaceView Actually Perform in Realtime - Maarte...Tricks to Making a Realtime SurfaceView Actually Perform in Realtime - Maarte...
Tricks to Making a Realtime SurfaceView Actually Perform in Realtime - Maarte...DroidConTLV
 
ParallelProgrammingBasics_v2.pdf
ParallelProgrammingBasics_v2.pdfParallelProgrammingBasics_v2.pdf
ParallelProgrammingBasics_v2.pdfChen-Hung Hu
 
The Ring programming language version 1.5.1 book - Part 175 of 180
The Ring programming language version 1.5.1 book - Part 175 of 180 The Ring programming language version 1.5.1 book - Part 175 of 180
The Ring programming language version 1.5.1 book - Part 175 of 180 Mahmoud Samir Fayed
 
é—œæ–ŒæžŹè©ŠïŒŒæˆ‘èȘȘçš„ć…¶ćŻŠæ˜Ż......
é—œæ–ŒæžŹè©ŠïŒŒæˆ‘èȘȘçš„ć…¶ćŻŠæ˜Ż......é—œæ–ŒæžŹè©ŠïŒŒæˆ‘èȘȘçš„ć…¶ćŻŠæ˜Ż......
é—œæ–ŒæžŹè©ŠïŒŒæˆ‘èȘȘçš„ć…¶ćŻŠæ˜Ż......hugo lu
 

Ähnlich wie "Quantum" performance effects (20)

Lecture6
Lecture6Lecture6
Lecture6
 
How to add an optimization for C# to RyuJIT
How to add an optimization for C# to RyuJITHow to add an optimization for C# to RyuJIT
How to add an optimization for C# to RyuJIT
 
Navigating the xDD Alphabet Soup
Navigating the xDD Alphabet SoupNavigating the xDD Alphabet Soup
Navigating the xDD Alphabet Soup
 
Checking the code of Valgrind dynamic analyzer by a static analyzer
Checking the code of Valgrind dynamic analyzer by a static analyzerChecking the code of Valgrind dynamic analyzer by a static analyzer
Checking the code of Valgrind dynamic analyzer by a static analyzer
 
The Ring programming language version 1.9 book - Part 99 of 210
The Ring programming language version 1.9 book - Part 99 of 210The Ring programming language version 1.9 book - Part 99 of 210
The Ring programming language version 1.9 book - Part 99 of 210
 
Tools and Techniques for Understanding Threading Behavior in Android*
Tools and Techniques for Understanding Threading Behavior in Android*Tools and Techniques for Understanding Threading Behavior in Android*
Tools and Techniques for Understanding Threading Behavior in Android*
 
A Unicorn Seeking Extraterrestrial Life: Analyzing SETI@home's Source Code
A Unicorn Seeking Extraterrestrial Life: Analyzing SETI@home's Source CodeA Unicorn Seeking Extraterrestrial Life: Analyzing SETI@home's Source Code
A Unicorn Seeking Extraterrestrial Life: Analyzing SETI@home's Source Code
 
Of class1
Of class1Of class1
Of class1
 
COSCUP: Introduction to Julia
COSCUP: Introduction to JuliaCOSCUP: Introduction to Julia
COSCUP: Introduction to Julia
 
00_Introduction to Java.ppt
00_Introduction to Java.ppt00_Introduction to Java.ppt
00_Introduction to Java.ppt
 
Open Cv 2005 Q4 Tutorial
Open Cv 2005 Q4 TutorialOpen Cv 2005 Q4 Tutorial
Open Cv 2005 Q4 Tutorial
 
Threaded Programming
Threaded ProgrammingThreaded Programming
Threaded Programming
 
Intel IPP Samples for Windows - error correction
Intel IPP Samples for Windows - error correctionIntel IPP Samples for Windows - error correction
Intel IPP Samples for Windows - error correction
 
Intel IPP Samples for Windows - error correction
Intel IPP Samples for Windows - error correctionIntel IPP Samples for Windows - error correction
Intel IPP Samples for Windows - error correction
 
Tricks to Making a Realtime SurfaceView Actually Perform in Realtime - Maarte...
Tricks to Making a Realtime SurfaceView Actually Perform in Realtime - Maarte...Tricks to Making a Realtime SurfaceView Actually Perform in Realtime - Maarte...
Tricks to Making a Realtime SurfaceView Actually Perform in Realtime - Maarte...
 
Java Performance Tuning
Java Performance TuningJava Performance Tuning
Java Performance Tuning
 
ParallelProgrammingBasics_v2.pdf
ParallelProgrammingBasics_v2.pdfParallelProgrammingBasics_v2.pdf
ParallelProgrammingBasics_v2.pdf
 
Mlcc #4
Mlcc #4Mlcc #4
Mlcc #4
 
The Ring programming language version 1.5.1 book - Part 175 of 180
The Ring programming language version 1.5.1 book - Part 175 of 180 The Ring programming language version 1.5.1 book - Part 175 of 180
The Ring programming language version 1.5.1 book - Part 175 of 180
 
é—œæ–ŒæžŹè©ŠïŒŒæˆ‘èȘȘçš„ć…¶ćŻŠæ˜Ż......
é—œæ–ŒæžŹè©ŠïŒŒæˆ‘èȘȘçš„ć…¶ćŻŠæ˜Ż......é—œæ–ŒæžŹè©ŠïŒŒæˆ‘èȘȘçš„ć…¶ćŻŠæ˜Ż......
é—œæ–ŒæžŹè©ŠïŒŒæˆ‘èȘȘçš„ć…¶ćŻŠæ˜Ż......
 

KĂŒrzlich hochgeladen

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

KĂŒrzlich hochgeladen (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

"Quantum" performance effects

  • 1. “Quantum” Performance Effects v 2.0; November 2013 Sergey Kuksenko sergey.kuksenko@oracle.com, @kuksenk0
  • 2. The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. Slide 2/45.
  • 4. Intro: performance engineering 1. Computer Science → Software Engineering Build software to meet functional requirements Mostly don’t care about HW and data specifics Abstract and composable, “formal science” 2. Performance Engineering “Real world strikes back!” Exploring complex interactions between hardware, software, and data Based on empirical evidence, i.e. «natural science» Slide 4/45.
  • 5. Intro: what’s the difference? architecture vs microarchitecture Slide 5/45.
  • 6. Intro: what’s the difference? architecture vs microarchitecture x86 AMD64(x86-64/Intel64) ARMv7 .... Slide 5/45. Nehalem Sandy Bridge Bulldozer Bobcat Cortex-A9
  • 7. Intro: SUTs1 R Intel○ Core TM i5-520M (Westmere) [↓2.0 GHz] 1x2x2 Ubuntu 10.04 (32-bit) Xubuntu 12.04 (64-bit) 1 System Slide 6/45. Under Test
  • 8. Intro: SUTs1 R Intel○ Core TM i5-520M (Westmere) [↓2.0 GHz] 1x2x2 Ubuntu 10.04 (32-bit) Xubuntu 12.04 (64-bit) R Intel○ Core TM i5-3320M (Ivy Bridge) [↓2.0 GHz] 1x2x2 Ubuntu 13.10 (64-bit) Samsung Exynos 4412, ARMv7 (Cortex-A9) [1.6 GHz] 1x4x1 Linaro 12.11 1 System Slide 6/45. Under Test
  • 9. Intro: SUTs (cont.) AMD Opteron TM 4274HE (Bulldozer/Valencia) [2.5 GHz] 2x8x1 Oracle Linux Server release 6.0 (64-bit) R R Intel○ Xeon○ CPU E5-2680 (Sandy Bridge) [2.70 GHz] 2x8x2 Oracle Linux Server release 6.3 (64-bit) Slide 7/45.
  • 10. Intro: JVM OpenJDK version “1.8.0-ea-lambda” build 83, 32-bits OpenJDK version “1.8.0-ea-lambda” build 83, 64-bits OpenJDK version “1.8.0-ea” build 116, 64-bits Java HotSpot(tm) Embedded “1.8.0-ea-b79” https://jdk8.java.net/download.html Slide 8/45.
  • 12. Intro: Required JMH (Java Microbenchmark Harness) http://openjdk.java.net/projects/code-tools/jmh/ Shipilёv A., “JMH: The Lesser of Two Evils” http://shipilev.net/pub/#benchmarking Slide 10/45.
  • 14. demo1: double sum private double [] A = new double [2048]; @GenerateMicroBenchmark public double test1 () { double sum = 0.0; for ( int i = 0; i < A. length ; i ++) { sum += A [i ]; } return sum ; } @GenerateMicroBenchmark public double testManualUnroll () { double sum = 0.0; for ( int i = 0; i < A. length ; i += 4) { sum += A [i] + A [i + 1] + A[i + 2] + A[i + 3]; } return sum ; } Slide 12/45.
  • 15. demo1: double sum private double [] A = new double [2048]; @GenerateMicroBenchmark public double test1 () { double sum = 0.0; for ( int i = 0; i < A. length ; i ++) { sum += A [i ]; } return sum ; } @GenerateMicroBenchmark public double testManualUnroll () { double sum = 0.0; for ( int i = 0; i < A. length ; i += 4) { sum += A [i] + A [i + 1] + A[i + 2] + A[i + 3]; } return sum ; } Slide 12/45. 327 ops/msec 699 ops/msec
  • 16. demo1: looking into asm, test1 loop : addsd addsd addsd addsd addsd addsd addsd addsd addsd addsd addsd addsd addsd addsd addsd addsd add cmp jl Slide 13/45. 0 x10 (% edi ,% eax ,8) ,% xmm0 0 x18 (% edi ,% eax ,8) ,% xmm0 0 x20 (% edi ,% eax ,8) ,% xmm0 0 x28 (% edi ,% eax ,8) ,% xmm0 0 x30 (% edi ,% eax ,8) ,% xmm0 0 x38 (% edi ,% eax ,8) ,% xmm0 0 x40 (% edi ,% eax ,8) ,% xmm0 0 x48 (% edi ,% eax ,8) ,% xmm0 0 x50 (% edi ,% eax ,8) ,% xmm0 0 x58 (% edi ,% eax ,8) ,% xmm0 0 x60 (% edi ,% eax ,8) ,% xmm0 0 x68 (% edi ,% eax ,8) ,% xmm0 0 x70 (% edi ,% eax ,8) ,% xmm0 0 x78 (% edi ,% eax ,8) ,% xmm0 0 x80 (% edi ,% eax ,8) ,% xmm0 0 x88 (% edi ,% eax ,8) ,% xmm0 $0x10 ,% eax % ebx ,% eax loop :
  • 17. demo1: looking into asm, testManualUnroll loop : movsd movsd movsd movsd movsd movsd addsd movsd movsd movsd movsd movsd addsd movsd addsd movsd movsd Slide 14/45. % xmm0 ,0 x20 (% esp ) 0 x48 (% eax ,% edx ,8) ,% xmm0 % xmm0 ,(% esp ) 0 x40 (% eax ,% edx ,8) ,% xmm0 % xmm0 ,0 x8 (% esp ) 0 x78 (% eax ,% edx ,8) ,% xmm0 0 x70 (% eax ,% edx ,8) ,% xmm0 0 x80 (% eax ,% edx ,8) ,% xmm1 % xmm1 ,0 x10 (% esp ) 0 x88 (% eax ,% edx ,8) ,% xmm1 % xmm1 ,0 x18 (% esp ) 0 x38 (% eax ,% edx ,8) ,% xmm4 0 x30 (% eax ,% edx ,8) ,% xmm4 0 x58 (% eax ,% edx ,8) ,% xmm5 0 x50 (% eax ,% edx ,8) ,% xmm5 0 x28 (% eax ,% edx ,8) ,% xmm1 0 x60 (% eax ,% edx ,8) ,% xmm2 movsd movsd movsd addsd addsd addsd addsd addsd addsd addsd addsd addsd addsd addsd addsd addsd add cmp jl 0 x68 (% eax ,% edx ,8) ,% xmm3 0 x20 (% eax ,% edx ,8) ,% xmm7 0 x18 (% eax ,% edx ,8) ,% xmm6 0 x10 (% eax ,% edx ,8) ,% xmm6 0 x10 (% esp ) ,% xmm0 % xmm7 ,% xmm6 0 x18 (% esp ) ,% xmm0 % xmm1 ,% xmm6 % xmm2 ,% xmm5 0 x20 (% esp ) ,% xmm6 % xmm3 ,% xmm5 0 x8 (% esp ) ,% xmm4 (% esp ) ,% xmm4 % xmm4 ,% xmm6 % xmm6 ,% xmm5 % xmm5 ,% xmm0 $0x10 ,% edx % ebx ,% edx loop :
  • 18. demo1: measure time private double [] A = new double [2048]; @GenerateMicroBenchmark @BenchmarkMode ( Mode . AverageTime ) @OperationsPerInvocation (2048) public double test1 () { double sum = 0.0; for ( int i = 0; i < A . length ; i ++) { sum += A [i ]; } return sum ; } @GenerateMicroBenchmark @BenchmarkMode ( Mode . AverageTime ) @OperationsPerInvocation (2048) public double testManualUnroll () { double sum = 0.0; for ( int i = 0; i < A . length ; i += 4) { sum += A [i] + A [i + 1] + A[i + 2] + A[i + 3]; } return sum ; } Slide 15/45.
  • 19. demo1: measure time private double [] A = new double [2048]; @GenerateMicroBenchmark @BenchmarkMode ( Mode . AverageTime ) @OperationsPerInvocation (2048) public double test1 () { double sum = 0.0; for ( int i = 0; i < A . length ; i ++) { sum += A [i ]; } return sum ; } @GenerateMicroBenchmark @BenchmarkMode ( Mode . AverageTime ) @OperationsPerInvocation (2048) public double testManualUnroll () { double sum = 0.0; for ( int i = 0; i < A . length ; i += 4) { sum += A [i] + A [i + 1] + A[i + 2] + A[i + 3]; } return sum ; } Slide 15/45. 1.5 nsec/op 0.7 nsec/op
  • 20. demo1: measure time private double [] A = new double [2048]; @GenerateMicroBenchmark @BenchmarkMode ( Mode . AverageTime ) @OperationsPerInvocation (2048) public double test1 () { double sum = 0.0; for ( int i = 0; i < A . length ; i ++) { sum += A [i ]; } return sum ; } @GenerateMicroBenchmark @BenchmarkMode ( Mode . AverageTime ) @OperationsPerInvocation (2048) public double testManualUnroll () { double sum = 0.0; for ( int i = 0; i < A . length ; i += 4) { sum += A [i] + A [i + 1] + A[i + 2] + A[i + 3]; } return sum ; } Slide 15/45. 1.5 nsec/op CPI = ∌2.5 0.7 nsec/op CPI = ∌0.6
  • 22. 𝜇Arch: x86 CISC and RISC modern x86 CPU is not what it seems All instructions (CISC) are dynamically translated into RISC-like microoperations ( 𝜇ops). Slide 16/45.
  • 25. 𝜇Arch: looking into instruction tables Instrustion 1 𝑇 ℎ𝑟𝑜𝑱𝑔ℎ𝑝𝑱𝑡 ADDSP r,r 3 1 MULSD r,r 5 1 ADD/SUB r,r 1 0.33 MUL/IMUL r,r Slide 19/45. Latency 3 1
  • 26. demo1: test1, looking into asm again loop : addsd addsd addsd addsd addsd addsd addsd addsd addsd addsd addsd addsd addsd addsd addsd addsd add cmp jl Slide 20/45. 0 x10 (% edi ,% eax ,8) ,% xmm0 0 x18 (% edi ,% eax ,8) ,% xmm0 0 x20 (% edi ,% eax ,8) ,% xmm0 0 x28 (% edi ,% eax ,8) ,% xmm0 0 x30 (% edi ,% eax ,8) ,% xmm0 0 x38 (% edi ,% eax ,8) ,% xmm0 0 x40 (% edi ,% eax ,8) ,% xmm0 0 x48 (% edi ,% eax ,8) ,% xmm0 0 x50 (% edi ,% eax ,8) ,% xmm0 0 x58 (% edi ,% eax ,8) ,% xmm0 0 x60 (% edi ,% eax ,8) ,% xmm0 0 x68 (% edi ,% eax ,8) ,% xmm0 0 x70 (% edi ,% eax ,8) ,% xmm0 0 x78 (% edi ,% eax ,8) ,% xmm0 0 x80 (% edi ,% eax ,8) ,% xmm0 0 x88 (% edi ,% eax ,8) ,% xmm0 $0x10 ,% eax % ebx ,% eax loop : 1.5 nsec/op ∌ 3 clk/op unroll by 16 19 insts CPI ∌ 2.5
  • 27. demo1: test1, other view 1.5 nsec/op ∌ 3 clk/op unroll by 16 19 insts CPI ∌ 2.5 Slide 21/45.
  • 29. demo1: testManualUnroll, other view 0.7 nsec/op ∌ 1.4 clk/op unroll by 4x4 36 insts CPI ∌ 0.6 Slide 23/45.
  • 30. 𝜇Arch: Dependencies 1 Performance ILP of many programs is limited by natural data dependencies. 1 Instruction Slide 24/45. Level Parallelism
  • 31. 𝜇Arch: Dependencies 1 Performance ILP of many programs is limited by natural data dependencies. What to do? Break the Dependency Chains! 1 Instruction Slide 24/45. Level Parallelism
  • 32. demo1 back: breaking chains in a “right” way Slide 25/45.
  • 33. demo1 back: breaking chains in a “right” way ... for ( int i = 0; i < A . length ; i ++) { sum += A [i ]; } return sum ; Slide 25/45.
  • 34. demo1 back: breaking chains in a “right” way ... for ( int i = 0; i < A . length ; i ++) { sum += A [i ]; } return sum ; ... for ( int i = 0; i < A . length ; i += 2) { sum0 += A [i ]; sum1 += A [i + 1]; } return sum0 + sum1 ; Slide 25/45.
  • 35. demo1 back: breaking chains in a “right” way ... for ( int i = 0; i < A . length ; i ++) { sum += A [i ]; } return sum ; ... for ( int i = 0; i < A . length ; i += 2) { sum0 += A [i ]; sum1 += A [i + 1]; } return sum0 + sum1 ; ... for ( int i = 0; i < array . length ; i += 4) { sum0 += A [i ]; sum1 += A [i + 1]; sum2 += A [i + 2]; sum3 += A [i + 3]; } return ( sum0 + sum1 ) + ( sum2 + sum3 ); Slide 25/45.
  • 36. demo1 back: double sum final results Nehalem AMD ARM testManualUnroll 0.70 0.45 3.31 test1 1.49 1.50 6.60 test2 0.75 0.79 4.25 test4 0.51 0.43 4.25 test8 0.51 0.25 2.55 time, nsec/op Slide 26/45.
  • 38. Branches: to jump or not to jump public int absSumBranch ( int a []) { int sum = 0; for ( int x : a) { if (x < 0) { sum -= x ; } else { sum += x ; } } return sum ; } Slide 28/45. loop : L1 : L2 : mov test jl add jmp sub inc cmp jl 0 xc (% ecx ,% ebp ,4) ,% ebx % ebx ,% ebx L1 % ebx ,% eax L2 % ebx ,% eax % ebp % edx ,% ebp loop
  • 39. Branches: to jump or not to jump public int absSumPredicated ( int a []) { int sum = 0; for ( int x : a) { sum += Math . abs ( x ); } return sum ; } Slide 29/45. loop : mov mov neg test cmovl add inc cmp jl 0 xc (% ecx ,% ebp ,4) ,% ebx % ebx ,% esi % esi % ebx ,% ebx % esi ,% ebx % ebx ,% eax % ebp % edx ,% ebp Loop
  • 40. demo3: results Regular Pattern = (+, –)* Nehalem Ivy Bridge Bulldozer Cortex-A9 branch_regular 0.88 0.88 0.82 5.02 branch_shuffled 6.42 1.29 2.84 9.44 branch_sorted 0.90 0.95 0.99 5.02 predicated_regular 1.33 1.16 0.92 5.33 predicated_shuffled 1.33 1.16 0.96 9.3 predicated_sorted 1.33 1.16 0.96 5.65 time, nsec/op Slide 30/45.
  • 41. demo3: results Regular Pattern = (+, +, –, +, –, –, +, –, –, +)* Nehalem Ivy Bridge Bulldozer Cortex-A9 branch_regular 1.55 1.14 0.98 5.02 branch_shuffled 6.38 1.33 2.33 9.53 branch_sorted 0.90 0.95 0.95 5.03 predicated_regular 1.33 1.16 0.95 5.33 predicated_shuffled 1.33 1.16 0.94 9.38 predicated_sorted 1.33 1.16 0.91 5.65 time, nsec/op Slide 31/45.
  • 42. demo4: && vs & public int countConditional ( boolean [] f0 , boolean [] f1 ) { int cnt = 0; for ( int j = 0; j < SIZE ; j ++) { for ( int i = 0; i < SIZE ; i ++) { if ( f0 [i] && f1 [j ]) { cnt ++; } } shuffled } sorted return cnt ; } Slide 32/45. && 4.4 nsec/op 1.5 nsec/op
  • 43. demo4: && vs & public int countLogical ( boolean [] f0 , boolean [] f1 ) { int cnt = 0; for ( int j = 0; j < SIZE ; j ++) { shuffled for ( int i = 0; i < SIZE ; i ++) { if ( f0 [i] & f1 [j ]) { sorted cnt ++; } } } shuffled return cnt ; } sorted Slide 33/45. && 4.4 nsec/op 1.5 nsec/op & 2.0 nsec/op 2.0 nsec/op
  • 44. demo5: interface invocation cost public interface I { public ... public class C0 implements I { public public class C1 implements I { public public class C2 implements I { public public class C3 implements I { public ... @GenerateMicroBenchmark @BenchmarkMode ( Mode . AverageTime ) @OperationsPerInvocation ( SIZE ) public int sum (I [] a ) { int s = 0; for (I i : a) { s += i . amount (); } return s; } Slide 34/45. int amount (); } int int int int amount (){ amount (){ amount (){ amount (){ return return return return 0; 1; 2; 3; } } } } } } } }
  • 45. demo5: results 1 target 2 targets 3 targets 4 targets 1.0 1.1 7.7 7.8 regular 1.0 7.7 19.0 shuffled 7.4 22.7 24.8 sorted time, nsec/op Slide 35/45.
  • 47. Not-a-Core: HW Multithreading Simultaneous multithreading, SMT R e.g. Intel○ Hyper-Threading Technology Fine-grained temporal multithreading e.g. CMT, Sun/Oracle ULTRASparc T1, T2, T3, T4, T5 ... Slide 37/45.
  • 48. back to demo1: Execution Units Saturation 1 thread 2 threads 4 threads DoubleSum.test1 327 654 1279 DoubleSum.test2 647 1293 1865 DoubleSum.test4 957 1916 1866 DoubleSum.testManualUnroll 699 1398 1432 overall throughput, ops/msec Slide 38/45.
  • 49. demo6: show All eyes on the the screen! Slide 39/45.
  • 50. demo6: show All eyes on the the screen! Slide 39/45.
  • 51. demo6: HDivs.heavy* results on Nehalem 1 thread int 180 double 90 throughput, ops/ 𝜇sec 2 threads -cpu 1,3 (double, int) (90, 90) (90, 90) (90, 90) (45, 45) (45, 45) (90, 180) (double, double) -cpu 3 (180, 180) (int, int) -cpu 2,3 (81, 18) (90, 45) throughput, ops/ 𝜇sec Slide 40/45.
  • 52. demo6: HDivs.heavy* results on AMD 1 thread int 128 double 306 throughput, ops/ 𝜇sec 2 threads -cpu 0,1 -cpu 0,2 -cpu 0,8 -cpu 0 (92, 92) (127, 127) (128, 132) (63, 63) (double, double) (151, 153) (304, 304) (313, 314) (154, 155) (double, int) (278, 119) (290, 127) (313, 129) (122, 64) (int, int) throughput, ops/ 𝜇sec Slide 41/45.
  • 54. Enlarge your knowledge with these simple tricks! Reading list: “Computer Architecture: A Quantitative Approach” John L. Hennessy, David A. Patterson http://www.agner.org/optimize/ Slide 43/45.
  • 56. Q & A Slide 45/45.