11. Simple
Write code
Your favorite programming language: C, C++, Objective-C, Java etc.
Compile
Compiler will transform your code into machine code
12. Simple
Write code
Your favorite programming language: C, C++, Objective-C, Java etc.
Compile
Compiler will transform your code into machine code
Run on target hardware
Hardware is a black box
13. Simple
Write code
Your favorite programming language: C, C++, Objective-C, Java etc.
Compile
Compiler will transform your code into machine code
Run on target hardware
Hardware is a black box
<- Righ t?
14. Simple
Write code
Your favorite programming language: C, C++, Objective-C, Java etc.
Compile
Compiler will transform your code into machine code
Run on target hardware
Hardware is a black box
Wro ng!
<- Righ t?
15. Simple
Write code
Your favorite programming language: C, C++, Objective-C, Java etc.
Compile
Compiler will transform your code into machine code
Run on target hardware
Hardware is a black box
32. Code Sample
int a = ...
int b = ...
// more code...
!
int c = a + b;
33. Code Sample
int a = ...
int b = ...
// more code...
!
int c = a + b;
34. Code Sample
int a = ...
int b = ...
// more code...
!
int c = a + b;
Q : H o w f a s t t h is
c o de is?
35. Code Sample
int a = ...
int b = ...
// more code...
!
int c = a + b;
Q : H o w f a s t t h is
c o de is?
A: De pe nd s.. .
36. Code Sample
int a = ...
int b = ...
// more code...
!
int c = a + b;
37. Code Sample
int a = ...
int b = ...
// more code...
!
int c = a + b;
38. Code Sample
int a = ...
int b = ...
// more code...
!
int c = a + b;
... on ho w fa st
CP U adds t wo
in te ge rs?
39. Code Sample
int a = ...
int b = ...
// more code...
!
int c = a + b;
... on ho w fa st
CP U adds t wo
in te ge rs?
NO
40. Code Sample
int a = ...
int b = ...
// more code...
!
int c = a + b;
... on ho w fa st
CP U adds t wo
Any mo de ge rs? U
in te rn CP
ca n add in te geO
N rs
ve ry fa st
!
~1 cycle
41. Code Sample
int a = ...
int b = ...
// more code...
!
int c = a + b;
42. Code Sample
int a = ...
int b = ...
// more code...
!
int c = a + b;
43. Code Sample
int a = ...
int b = ...
// more code...
!
int c = a + b;
... on wh et he r `a’
an d `b’ are re ad y
fo r proc es sing
44. Code Sample
int a = ...
int b = ...
// more code...
!
int c = a + b;
... on wh et he r `a’
an d `b’ are re ad y
pr loade d in
fo r i.e .oc es sing to
CP U re gis te rs
45. Code Sample
int a = ...
int b = ...
// more code...
!
int c = a + b;
... on wh et he r `a’
an d `b’ are re ad y
foo apr.oc es sing to
d at de
L r i.e dloaa d in
me re r y
f romCP Umogis te rs
in t o a re g is te r
!
~600 cyc le s
46. Code Sample
int a = ...
int b = ...
// more code...
!
int c = a + b;
47. Code Sample
int a = ...
int b = ...
// more code...
!
int c = a + b;
48. Code Sample
int a = ...
int b = ...
// more code...
!
int c = a + b;
Q : Wh at CP U is
do ing in t h e
me a n t ime?
49. Code Sample
int a = ...
int b = ...
// more code...
!
Q : Wh at CP U is
do ing in t h e
me a n t ime?
int c = a + b;
A: Nothing! It’s
waiting for data
50. Code Sample
int a = ...
int b = ...
// more code...
!
int c = a + b;
79. Branch Prediction
What
if (day == Monday) // 1 <dose = kDouble;
// 2 ins tr uc tio n to
load & de co de
else
ne xt ?
dose = kStandard; // 3
!
make_coffee(dose);
// 4
80. Branch Prediction
What
if (day == Monday) // 1 <dose = kDouble;
// 2 ins tr ucttio n to
<- wo
load & de co de
or
else
xt ?
<-neth re e
dose = kStandard; // 3
?
!
make_coffee(dose); // 4
83. Branch Prediction
if (day == Monday)
dose = kDouble;
else
dose = kStandard;
!
make_coffee(dose);
// 1
// 2
CP U wi ll tr y to
pr 3
//edict an d st art
load & de co de
// 4
84. Branch Prediction
if (day == Monday)
dose = kDouble;
else
dose = kStandard;
!
make_coffee(dose);
// 1
// 2 wa s w ro ng:
If it
CPis cwi ll tr s utos,
d U a rd re y lt
pr flus p d st ar
//edicthanip e li ne t
3
load & de co de
// 4
128. History Lesson
For the past 30+ years we saw huge
improvements in CPU processing power
and data sizes
129. History Lesson
For the past 30+ years we saw huge
improvements in CPU processing power
and data sizes ... b u t
130. History Lesson
For the past 30+ years we saw huge
improvements in CPU processing power
and data sizes
Memory speeds couldn’t keep up with the
progress
144. Memory Hierarchy
iPh one 4s:
!
32KB L1i
32KB L1d
1 MB L2
512 MB DR AM
A c c e s s:
L1i/L1d
L2 Cache
Memory
!
re g is te rs - 1 cyc le
L1 - 5 cyc le s
L2 - 40 cyc le s
DR AM - 610
153. Cache Line
What does it mean?
Consider you have an array of 16 floats
and you want the first float for
calculations
154. Cache Line
What does it mean?
Consider you have an array of 16 floats
and you want the first float for
calculations
If it’s not in cache already, you will pay
the “full price” to load entire cache line
155. Cache Line
What does it mean?
Consider you have an array of 16 floats
and you want the first float for
calculations
If it’s not in cache already, you will pay
the “full price” to load entire cache line
Access remaining 15 floats “for free”
157. Prefetch
Modern CPUs and compilers are able to
detect memory access patterns and
preload data in caches speculatively
158. Prefetch
Modern CPUs and compilers are able to
detect memory access patterns and
preload data in caches speculatively
So, data will be ready when you need it
159. Prefetch
Modern CPUs and compilers are able to
detect memory access patterns and
preload data in caches speculatively
So, data will be ready when you need it
But your data access patterns must be
very simple - linear is a good one
160. Prefetch
Modern CPUs and compilers are able to
detect memory access+patterns and
BT W, C+
p e rat o rocaches> speculatively
preload data in
t ime s
s ome
e r re d t a s
re freadyowhen you need it
So, data will be
“c ach e m is s”
ope rat o r
But your data access patterns must be
very simple - linear is a good one
161. Prefetch
Modern CPUs and compilers are able to
detect memory access+patterns and
BT W, C+
p e rat o rocaches> speculatively
preload data in Can tyimue gue s s
o s
s ome w
h y? s
e r re d t a
re freadyowhen you need it
So, data will be
“c ach e m is s”
ope rat o r
But your data access patterns must be
very simple - linear is a good one
162. Prefetch
Modern CPUs and compilers are able to
detect memory access patterns and
preload data in caches speculatively
So, data will be ready when you need it
But your data access patterns must be
very simple - linear is a good one
172. References
Ulrich Drepper, “What Every
Programmer Should Know About
Memory”
Крис Касперски, “Техника
оптимизации программ. Еффективное
использование памяти”
@mike_acton