The document discusses using message counting as a profiling technique in Pharo Smalltalk. It shows that counting messages is more stable than traditional execution sampling profiling, as message counts are not impacted by variations in execution environment. An experiment showed a strong correlation between number of messages and average execution time. Counting messages can be used to identify performance bottlenecks and compare performance across versions.
1. Counting Messages as a
Proxy for Average
Execution Time in Pharo
ECOOP 2011 - Lancaster
Alexandre Bergel
Pleiad lab, DCC, University of Chile
http://bergel.eu
3. “I like the cool new features of Mondrian, but in my setting,
drawing a canvas takes 10 seconds, whereas it took only 7
yesterday. Please do something!”
-- A Mondrian user, 2009 --
3
4. “I like the cool new features of Mondrian, but in my setting,
drawing my visualization takes 10 seconds, whereas it took
only 7 yesterday. Please do something!”
-- A Mondrian user, 2009 --
4
7. “I like the cool new features of Mondrian, but in my setting,
drawing my visualization takes 10 seconds, whereas it took
only 7 yesterday. Please do something!”
-- A Mondrian user, 2009 --
On my machine I find 11 and 6 seconds. What’s going on?
7
8. How profilers work
Sampling the method call stack every 10 ms
A counter is associated to each frame
Each counter is incremented when being sampled
8
9. How profilers work
Sampling the method call stack every 10 ms
A counter is associated to each frame
Each counter is incremented when being sampled
MONode displayOn: (1)
method call MORoot displayOn: (1)
stack
Canvas drawOn: (1)
Time = t
9
10. How profilers work
Sampling the method call stack every 10 ms
A counter is associated to each frame
Each counter is incremented when being sampled
MOEdge displayOn: (1)
MONode displayOn: (2)
method call MORoot displayOn: (2)
stack
Canvas drawOn: (2)
Time = t + 10 ms
10
11. How profilers work
Sampling the method call stack every 10 ms
A counter is associated to each frame
Each counter is incremented when being sampled
MONode setCache (1)
method call MORoot displayOn: (3)
stack
Canvas drawOn: (3)
Time = t + 20 ms
11
12. How profilers work
The counter is used to estimate the amount of time
spent
MONode setCache (1)
MOEdge displayOn: (1)
MONode displayOn: (2)
MORoot displayOn: (3)
Canvas drawOn: (3)
12
13. How profilers work
The counter is used to estimate the amount of time
spent
MONode setCache (1) => 10 ms
MOEdge displayOn: (1) => 10 ms
MONode displayOn: (2) => 20 ms
MORoot displayOn: (3) => 30 ms
Canvas drawOn: (3) => 30 ms
13
14. Problem with execution sampling #1
Strongly dependent on the executing environment
CPU, memory management, threads, virtual machine, processes
Listening at a mp3 may perturb your profile
14
15. Problem with execution sampling #2
Non-determinism
Even using the same environment does not help
“30000 factorial” takes between 3 803 and 3 869 ms
15
16. Problem with execution sampling #3
Lack of portability
Profiles are not reusable across platform
Buying a new laptop will invalidate the profile you
made yesterday
16
17. Counting messages to the rescue
Pharo is a Smalltalk dialect
Intensively based on sending message
Almost “Optimization-free compiler”
Why not to count messages instead of execution
time?
17
19. Does this really work?
What about the program?
MyClass >> main
self waitForUserClick
We took scenarios from unit tests, which do not rely
on user input
19
20. Experiment A
6
400 x 10
message sends
400000000
6 application
300 x 10
300000000
6
200 x 10
200000000
6
100 x 10
100000000
0
0 10000 20000 30000 40000
times (ms)
The number of sent messages related to the average
execution time over multiple executions 20
21. Experiment B
Application time taken (ms) # sent messages ctime % cmessages %
Collections 32 317 334 359 691 16.67 1.05
Mondrian 33 719 292 140 717 5.54 1.44
Nile 29 264 236 817 521 7.24 0.22
Moose 25 021 210 384 157 24.56 2.47
SmallDude 13 942 150 301 007 23.93 0.99
Glamour 10 216 94 604 363 3.77 0.14
Magritte 2 485 37 979 149 2.08 0.85
PetitParser 1 642 31 574 383 46.99 0.52
Famix 1 014 6 385 091 18.30 0.06
DSM 4 012 5 954 759 25.71 0.17
ProfStef 247 3 381 429 0.77 0.10
Network 128 2 340 805 6.06 0.44
AST 37 677 439 1.26 0.46
XMLParser 36 675 205 32.94 0.46
Arki 30 609 633 1.44 0.35
ShoutTests 19 282 313 5.98 0.11
Average 13.95 0.61
Table 2.number of sent messages more stable third columns
The Applications considered in our experiment (second and than the
are average overtime over multiple executions
execution 10 runs)
21
22. Experiment C
6
number of method
10000000
10.0 x 10
invocations
6
method
7500000
7.5 x 10
6
5.0 x 10
5000000
6
2.5 x 10
2500000
0
0 75 150 225 300
time (ms)
The number of sent messages as useful as the
execution time to identify an execution bottleneck
22
24. New primitive in the VM
CompteurMethod>> run: methodName with: args in: receiver
| oldNumberOfCalls v |
oldNumberOfCalls := self getNumberOfCalls.
v := originalMethod valueWithReceiver: receiver arguments: args.
numberOfCalls :=
(self getNumberOfCalls - oldNumberOfCalls) + numberOfCalls - 5.
ˆv
24
25. Cost of the instrumentation
Overhead (%) Overhead (%)
3000 10000
2250 1000
1500 100
750 10
0 1
0 10000 20000 30000 40000 0 10000 20000 30000 40000
Execution time (ms) Execution time (ms)
(a) Linear scale (b) Logarithmic scale
25
26. Contrasting Execution Sampling with
Message Counting
No need for sampling
Independent from the execution environment
Stable measurements
26
27. Application #1
Counting messages in unit testing
CollectionTest>>testInsertion
self
assert: [ Set new add: 1]
fasterThan: [Set new add: 1; add: 2]
27
29. Application #2
Differencing profiling
Comparison of two successive
versions of a software
(not in the paper) 29
30. Application #2
Differencing profiling
Comparison of two successive
versions of Mondrian
(not in the paper) 30
31. More in the paper
Linear regression model
We replay some optimizations we had in our previous
work
A methodology to evaluate profiler stability over
multiple run
All the material to reproduce the experiments
31
32. Summary
Counting method invocation is a more advantageous
profiling technique, in Pharo
Stable correlation between message sending and
average execution time
32
33. Closing words
The same abstractions are used to profile
applications written in C and in Java
Which objects is responsible of a slowdown?
Which arguments make a method call slow?
...
33
34. 6
number of method
6
10000000
10.0 x 10
400 x 10
invocations
message sends
400000000
6
6
7500000
7.5 x 10
300 x 10
300000000
6 6
200 x 10
200000000 5.0 x 10
5000000
6 6
100 x 10
100000000 2.5 x 10
2500000
0
0 10000 20000 30000 40000 0
0 75 150 225 300
times (ms) time (ms)
Counting message as a proxy for average execution time
Alexandre Bergel
http://bergel.eu
Overhead (%) Overhead (%)
3000 10000
CollectionTest>>testInsertion
2250 1000
self
assert: [Set new 1500 1]
add: 100
fasterThan: [Set new add: 1; add: 2]
750 10
0 1
0 10000 20000 30000
34 40000 0 10000 20000 30000 40000
Execution time (ms) Execution time (ms)