SlideShare ist ein Scribd-Unternehmen logo
1 von 36
Downloaden Sie, um offline zu lesen
the art of
performance tuning
Jonathan Ross
Hello there, readers at home! These are the slides I used for my “Art of Performance Tuning” talk at Java One 2017. My
apologies for not sharing them as a power point file - I had some technical difficulties. My slides (done in Apple’s Keynote app)
are using the XKCD font, and there is no way of embedding the font using the Mac version of Power point (and most people
don’t have Keynote installed either.)

Ah well, at least these presenter notes are good for something.

The code for the demo is available at https://github.com/JoroRoss/art-of-performance.
the art of
performance tuning
Jonathan Ross
science
engineering
Of course, the title of the talk is all wrong. The original talk was called “The art of Java Performance”, but ‘Java’ seemed a bit
redundant at a JAVA conference. We’re not tuning the JVM as such (this is not a GC tuning talk), so ‘engineering’ is a better
match. Finally performance engineers should follow the scientific method, so ‘science’ is a better choice of word than ‘art’.
trading company

big in options and futures 

based in Amsterdam
imc Trading
I’ve worked at IMC for 19 years. IMC is a proprietary trading firm founded in Amsterdam some 25+ years ago. We’re big in HFT.
me
Playing with JVMs since 1997

Paid to do so by IMC Chicago

Theoretical physics background

CJUG board member

JCP associate member
@JoroRoss
As for me, I’ve been messing around with JVMs for quite some time. A lot of my work at IMC is in quantitative engineering,
writing numerical algorithms and the likes. But I also spend plenty of time on architecture and performance engineering.
it’s too slow
Why this talk? In my work, I’m often asked to look at performance issues by other developers.
have you tried profiling it?
it’s too slow
When I ask what they have done to investigate so far, I am surprised by the lack of a plan-of-attack. In particular the lack of measurements and
the tendency to try changes on a hunch stick out.

In my experience too many developers have little to no experience with micro benchmarks or profilers.
this talk
Part 1: theory
• challenges

• methodology

• implementation

Part 2: practice
• how to use a profiler

• hands-on jmh benchmarks
“Theory” in the sense of “not practice”

In this talk, I’ll present an approach to measuring and monitoring performance, and give you some hands-on demonstrations of
micro-benchmarking and profiling.

Show of hands: who regularly uses a profiler? JMH?

Target audience: if your name is Martin Thompson or Kirk Pepperdine, you probably won’t learn anything new during the next 45
minutes. This talk is for the rest of us.

Disclaimer: while it is a very important part of java performance tuning…
this talk
is not about gc tuning
Part 1: theory
• challenges

• methodology

• implementation

Part 2: practice
• how to use a profiler

• hands-on jmh benchmarks
…this talk is not about GC tuning. Or at least, it will not focus on it. For argument’s sake, lets say that you have already seen all
of Gil and Kirks talks and your allocation charts are flat lining and your par-new collects are measured in the milli-seconds rather
than the seconds. Or, perhaps more realistically, that after some tuning, you have determined that it is no longer your number
one bottleneck.
Part 1: theory
challenges
objects

modern cpus

lack of determinism

virtualization

tooling
ignorance

confirmation bias

premature optimisation

over-engineering

legacy
Technical and non-technical challenges to Java performance engineering.

technical side:

- Everything is an object - pointer chasing, no fine grained-control over memory layout

- Modern hardware - this ain’t your grandfather’s Turing machine

- Lack of determinism: Garbage collector, JIT, (interaction with) other processes (!)
- Virtualization - illusions of the JVM - it’s hard to strip away the layers of indirection and find out what is actually going on (also/
in particular for the tools)

- Tooling (or a lack thereof) - tools suffer from the same technical difficulties. Biases.
challenges
ignorance

confirmation bias

premature optimisation

over-engineering

legacy
objects

modern cpus

lack of determinism

virtualization

tooling
Human/organizational challenges:

- ignorance: probably the easiest to overcome (google/slashdot :P). 

- confirmation bias: something I often catch myself out on. When you think you’ve found a bottleneck and have sunk time into a fix, it can be
hard to reject it. “To assume is to make an ass of you and me.”

- premature optimization - 80/20 rule (Pareto principle) - “roughly 80% of the effects come from 20% of the causes” (more like 1% in java
perf?) Problem with optimization - trade-off: performance vs. maintainability/legibility

- ‘over-engineering’ (see also legacy) - hard to refactor bells and whistles

- legacy: legacy code, legacy ways of doing things.
Methodology
find a (proxy) metric for 

implement 

automate
“success”

a way of measuring it

a way of monitoring it
This is really the scientific method in disguise. Monitoring is a CS/accountability twist.

This methodology holds for any engineering task, not just java performance engineering.
implementation
make it easy to 

find root causes using





fix and add regression
tests using

learn to use and
understand





reproduce issues

production metrics

macro benchmarks

micro benchmarks

profilers

Repeatability/reproducibility is key.

This is best done if you build these systems into your architecture from the get-go, for instance as an event-sourced architecture.
Chances are you are dealing with legacy systems… Well, you can always profile in production…
implementation
make it easy to 

find root causes using





fix and add regression
tests using

learn to use and
understand

if the going gets tough



reproduce issues

production metrics

macro benchmarks

micro benchmarks

profilers

-XX+UnlockDiagnosticVMOptions

-XX:+LogCompilation

-XX:+PrintAssembly
When the going gets tough (which is not that often – most performance issues I investigate tend to be solvable without diving too deep), you can do
JIT compilation and use tools like JitWatch.

Perhaps good to point out that you should also look beyond the JVM, indeed you should always start your investigations outside the JVM, and look
at your system as a whole. What else is going on? Are there competing processes? Is the host healthy? Etc. etc.
handy-dandy flowchart
production
metrics
reproduce
in benchmark
profile
implement
fix
metrics
better?
ditch fix
nope
yup
good
enough?
nope
merge and verify
in production
yup
?
find
bottleneck?
yes sir
Try to reproduce production metrics in a repeatable benchmark. This is quite easy to achieve in event-sourced systems, but it is
not too hard to retrofit legacy systems with facilities for replaying production scenarios in a loop.

Regarding the ‘ditch fitch’ box: beware of the sunk cost fallacy and confirmation biases!
handy-dandy flowchart
production
metrics
reproduce
in benchmark
profile
implement
fix
metrics
better?
ditch fix
nope
yup
good
enough?
nope
merge and verify
in production
yup
?
find
bottleneck?
yes sir• try harder
• rearchitect
• seek new job
erm…
Of course there was a choice missing in the flow chart.

If you run out of low-hanging fruit, you face some more formidable challenges, often requiring a lot of rethinking rearchitecting.
handy-dandy flowchart
production
metrics
reproduce
in benchmark
profile
implement
fix
metrics
better?
ditch fix
nope
yup
good
enough?
nope
merge and verify
in production
yup
?
find
bottleneck?
yes sir
1. Only work on improving performance
of actual bottle-necks
2. Don’t keep a fix unless it improves
performance (or makes the code
better)
the main take-away:
Part 2: practice
A STORY OF ATAN
a (redacted) real world performance

tuning adventure involving my love for 

math and some short-sightedness
it’s too
slow
let me try to profile it
For those of you reading along from home, the second part of this talk covers a production performance regression that I
investigated at IMC a couple of months ago. We start off with some pre-recorded screenshots of a Yourkit profiling session of a
(redacted to keep some proprietary information… proprietary), but then moves on to a live demo of profiling a JMH benchmark
using VisualVM and JMC. 

The code for the demo is available at https://github.com/JoroRoss/art-of-performance.
To profile a production system, start your server with the JVM argument:

-agentpath:<profiler directory>/bin/linux-x86-64/libyjpagent.so
stack telemetry (Yourkit)
This example is using Yourkit - but most of the functionality I am going to show here is also available in visualvm and jmc (albeit
not quite as user friendly).

Stack telemetry is first view I check in yourkit - get a good visual feel of what the app is doing. If you know a program well, this
can be a very good way of seeing if it’s behaving normally.
weeding out irrelevant data
(tip: turn off filters by default)
I like to turn off all filters in my profiler. As a result, things like IO show up. Not what we’re interested in here. Exclude!
export threads were not supposed to be this busy…
What-if: yourkit’s drill-down is awesome. (But shark was better!)
ordering callees by “own time” shows us the bottleneck
Now we see the export threads at the top of the list. Selecting one, we can see info about all callees in the lower panel. I like to
sort this by ‘own time’.
where is the bottleneck called from?
select method back-traces
Holy moly, we’re spending 50% of the time in the export threads in StrictMath.atan

Okay, so who is calling this method? Many ways to skin a cat, here’s one of them.
(wouldn’t you like to know!)
2 bottlenecks for the price of one
Bonus! atan was not just a bottleneck in one branch.

The redaction is didactic - no prejudices regarding what the program is doing!

The bottleneck is clear though - the ‘export’ and ‘model’ methods are both calling the one above it (called ‘price’). Let’s focus
on it.
my favorite feature of Yourkit: focusing on sub-set of data
eureka!
next favorite feature: merged callees
This ‘What-if’ tab has reduced the stack frames it is considering to the ones in which our ‘PricingService.price’ method is being
called.

Merged callees: call tree of the highlighted method, from all stack frames. Really nice this, especially for recursive calls.
To the micro benchmarks!
(demo)
This part of the talk switches to some live coding an profiling (screenshot added for the benefit of people following along from
home.)

JMC is introduced (Java Microbenchmark Harness by Aleksey Shipilev which is part of the OpenJDK project).

Open the class FoobaratorBenchmark, Run benchmark -> ~ 2150 us 

Create and run profiler benchmark (copy benchmark or edit existing one, use 100 iterations) set to fork 0 or 1 times.

Go to JVisualVM, start sample, wait, stop, save snapshot

Go to hotspot tab -> atan

-> find in call tree

we are in business! Atan is >50% of benchmark

find call site invoking it

Navigate to NormalizedAtan in IDE
normalizedAtan = 2
⇡ tan 1 ⇡
2 x
The normalized arctangent function is being used in the algorithm as a smooth range-limiting function. it’s linear near the origin
with a slope of 1, and goes to +-1 asymptotically. 

The algorithm just needs this behaviour, but it doesn’t need the IEEE accuracy of SafeMath.atan!
tan 1
(x) ⇡
⇡(0.596227x + x2
)
2(1 + 1.192524x + x2)
ApproximateAtan is a rational function approximation of atan, the normalized version is accurate to within about 0.0018 rad—
this is the sort of trigonometry optimization that is very common in the gaming industry, and avoids the expense of computing
the arctangent to IEEE precision.
Back To the micro benchmarks!
(demo)
(Screenshot added for the benefit of people following along from home.)

The demo continues:

Run AtanBenchmark. The rational version is ~30 times faster!

Benchmark Mode Cnt Score Error Units

AtanBenchmark.approxAtan thrpt 200 348.790 ± 19.346 ops/us

AtanBenchmark.atan thrpt 200 11.119 ± 0.349 ops/us

Switch FoobarCalculator to use ApproxmiateAtan.normalizedAtan

Run benchmark - it’s actually not much faster (about 8%?)

Benchmark Mode Cnt Score Error Units

FoobaratorBenchmark.foobar avgt 200 1722.974 ± 27.549 us/op

Our profiler was lying to us! (Safe point biasing?)
Benchmark Mode Cnt Score Error Units
FoobaratorBenchmark.benchmark avgt 100 2164.391 ± 49.520 us/op
original
Benchmark Mode Cnt Score Error Units
FoobaratorBenchmark.benchmark avgt 100 2043.580 ± 80.292 us/op
approximate, outside loop (6%)
Benchmark Mode Cnt Score Error Units
FoobaratorBenchmark.benchmark avgt 100 1991.929 ± 50.966 us/op
approximate (8% speedup)
Benchmark Mode Cnt Score Error Units
FoobaratorBenchmark.benchmark avgt 100 2045.587 ± 55.951 us/op
precise, outside loop (6%)
The benchmarks I’m running in the demo are running a bit too quickly to provide statistically reliable results (longer = better in
benchmarking and profiling). This slide shows results for the various stages of the demo run with a few more iterations.
Java mission control
(demo)
(Screenshot added for the benefit of people following along from home.)

Run FlightrecorderBenchmark or AutoJFRBenchmark

Open jmc, make flight recording look at memory tab -> allocating a lot of Guava Row objects (also lambdas)

In the rest of the demo, I check out some other branches of the codebase where I have put in fixes for the memory allocation
pressure (a map of maps, avoiding using boxed doubles, avoiding using capturing lambdas - in a hot loop at least).

JFR/JMC is also lying to us - it doesn’t report samples for native methods.

Most performance gains in this demo come from reducing the allocation pressure, others come from hoisting expensive
computations out of loops/avoiding duplicate calculations. The final solution presented is not pretty, but it solves the allocation
pressure issue. The best solution for this particular algorithm would probably be to restructure the input data to be pre-
partitioned, and/or use some primitive maps like Eclipse collections.

Allocation pressure leads to more than just keeping the garbage collector busy: it also makes it very hard to use the cache lines
Benchmark Mode Cnt Score Error Units
FoobaratorBenchmark.benchmark avgt 100 2393.484 ± 111.872 us/op
original
Benchmark Mode Cnt Score Error Units
FoobaratorBenchmark.benchmark avgt 100 1163.414 ± 17.417 us/op
map of maps (52%)
Benchmark Mode Cnt Score Error Units
FoobaratorBenchmark.benchmark avgt 100 660.802 ± 5.795 us/op
mutable double (72%)
Benchmark Mode Cnt Score Error Units
FoobaratorBenchmark.benchmark avgt 100 894.116 ± 9.971 us/op
map of maps (2) (63%)
Benchmark Mode Cnt Score Error Units
FoobaratorBenchmark.benchmark avgt 100 547.549 ± 4.122 us/op
atan outside loop again (77%)
This slide shows results for the various stages of the JMC part of the demo run with a few more iterations.
challenges revisited
ignorance

confirmation bias

premature optimisation

overengineering

legacy
objects

modern cpus

lack of determinism

virtualization

tooling
We just saw most of these challenges. The last three organizational ones are tough nuts to crack. The 80-20 rule tells us not to
worry too much about performance until we’ve found an actual bottleneck, but we saw that over-engineering and legacy
codebases can make it pretty hard to deal with the actual pain points. 

My experience, is that if one sticks to clean code, guided by well-chosen architectural rules of thumb, you can’t go too far
wrong. In practice, this means functional programming, and adhering to the ‘tell don’t ask’ mantra. Best practices ≠ premature
optimization.

I hope you found it interesting to hear some of my war stories.
?
@JoroRoss

www.imc.com

www.cjug.org
http://openjdk.java.net/projects/code-tools/jmh/

http://hirt.se - Marcus Hirt’s blog on JMC/JFR

https://visualvm.github.io
The demo code is available at

https://github.com/JoroRoss/art-of-performance
	 More information on the tools I used in the demo. Questions?

Weitere ähnliche Inhalte

Ähnlich wie The Art Of Performance Tuning - with presenter notes!

6 Principles for Enabling Build/Measure/Learn: Lean Engineering in Action
6 Principles for Enabling Build/Measure/Learn: Lean Engineering in Action6 Principles for Enabling Build/Measure/Learn: Lean Engineering in Action
6 Principles for Enabling Build/Measure/Learn: Lean Engineering in ActionBill Scott
 
Lessons Learned in Software Development: QA Infrastructure – Maintaining Rob...
Lessons Learned in Software Development: QA Infrastructure – Maintaining Rob...Lessons Learned in Software Development: QA Infrastructure – Maintaining Rob...
Lessons Learned in Software Development: QA Infrastructure – Maintaining Rob...Cωνσtantίnoς Giannoulis
 
Lean engineering for lean/balanced teams: lessons learned (and still learning...
Lean engineering for lean/balanced teams: lessons learned (and still learning...Lean engineering for lean/balanced teams: lessons learned (and still learning...
Lean engineering for lean/balanced teams: lessons learned (and still learning...Balanced Team
 
WinOps Conf 2016 - Matteo Emili - Development and QA Dilemmas in DevOps
WinOps Conf 2016 - Matteo Emili - Development and QA Dilemmas in DevOpsWinOps Conf 2016 - Matteo Emili - Development and QA Dilemmas in DevOps
WinOps Conf 2016 - Matteo Emili - Development and QA Dilemmas in DevOpsWinOps Conf
 
Building a Cutting-Edge Data Process Environment on a Budget by Gael Varoquaux
Building a Cutting-Edge Data Process Environment on a Budget by Gael VaroquauxBuilding a Cutting-Edge Data Process Environment on a Budget by Gael Varoquaux
Building a Cutting-Edge Data Process Environment on a Budget by Gael VaroquauxPyData
 
ITCamp 2013 - Florin Coros - Driving Your Team Towards Code Quality
ITCamp 2013 - Florin Coros - Driving Your Team Towards Code QualityITCamp 2013 - Florin Coros - Driving Your Team Towards Code Quality
ITCamp 2013 - Florin Coros - Driving Your Team Towards Code QualityITCamp
 
Neotys PAC - Stijn Schepers
Neotys PAC - Stijn SchepersNeotys PAC - Stijn Schepers
Neotys PAC - Stijn SchepersNeotys_Partner
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLPaco Nathan
 
Code instrumentation
Code instrumentationCode instrumentation
Code instrumentationMennan Tekbir
 
Chaos Engineering Talk at DevOps Days Austin
Chaos Engineering Talk at DevOps Days AustinChaos Engineering Talk at DevOps Days Austin
Chaos Engineering Talk at DevOps Days Austinmatthewbrahms
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningPaco Nathan
 
"We are doing it wrong."
"We are doing it wrong.""We are doing it wrong."
"We are doing it wrong."weissgraeber
 
Troubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep LearningTroubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep LearningSergey Karayev
 
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...gdgsurrey
 
Strata CA 2019: From Jupyter to Production Manu Mukerji
Strata CA 2019: From Jupyter to Production Manu MukerjiStrata CA 2019: From Jupyter to Production Manu Mukerji
Strata CA 2019: From Jupyter to Production Manu MukerjiManu Mukerji
 
30 days or less: New Features to Production
30 days or less: New Features to Production30 days or less: New Features to Production
30 days or less: New Features to ProductionKarthik Gaekwad
 
assertYourself - Breaking the Theories and Assumptions of Unit Testing in Flex
assertYourself - Breaking the Theories and Assumptions of Unit Testing in FlexassertYourself - Breaking the Theories and Assumptions of Unit Testing in Flex
assertYourself - Breaking the Theories and Assumptions of Unit Testing in Flexmichael.labriola
 
2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with BlackfireMarko Mitranić
 

Ähnlich wie The Art Of Performance Tuning - with presenter notes! (20)

6 Principles for Enabling Build/Measure/Learn: Lean Engineering in Action
6 Principles for Enabling Build/Measure/Learn: Lean Engineering in Action6 Principles for Enabling Build/Measure/Learn: Lean Engineering in Action
6 Principles for Enabling Build/Measure/Learn: Lean Engineering in Action
 
Enabling Lean at Enterprise Scale: Lean Engineering in Action
Enabling Lean at Enterprise Scale: Lean Engineering in ActionEnabling Lean at Enterprise Scale: Lean Engineering in Action
Enabling Lean at Enterprise Scale: Lean Engineering in Action
 
Lessons Learned in Software Development: QA Infrastructure – Maintaining Rob...
Lessons Learned in Software Development: QA Infrastructure – Maintaining Rob...Lessons Learned in Software Development: QA Infrastructure – Maintaining Rob...
Lessons Learned in Software Development: QA Infrastructure – Maintaining Rob...
 
Lean engineering for lean/balanced teams: lessons learned (and still learning...
Lean engineering for lean/balanced teams: lessons learned (and still learning...Lean engineering for lean/balanced teams: lessons learned (and still learning...
Lean engineering for lean/balanced teams: lessons learned (and still learning...
 
Benchmarking PyCon AU 2011 v0
Benchmarking PyCon AU 2011 v0Benchmarking PyCon AU 2011 v0
Benchmarking PyCon AU 2011 v0
 
WinOps Conf 2016 - Matteo Emili - Development and QA Dilemmas in DevOps
WinOps Conf 2016 - Matteo Emili - Development and QA Dilemmas in DevOpsWinOps Conf 2016 - Matteo Emili - Development and QA Dilemmas in DevOps
WinOps Conf 2016 - Matteo Emili - Development and QA Dilemmas in DevOps
 
Building a Cutting-Edge Data Process Environment on a Budget by Gael Varoquaux
Building a Cutting-Edge Data Process Environment on a Budget by Gael VaroquauxBuilding a Cutting-Edge Data Process Environment on a Budget by Gael Varoquaux
Building a Cutting-Edge Data Process Environment on a Budget by Gael Varoquaux
 
ITCamp 2013 - Florin Coros - Driving Your Team Towards Code Quality
ITCamp 2013 - Florin Coros - Driving Your Team Towards Code QualityITCamp 2013 - Florin Coros - Driving Your Team Towards Code Quality
ITCamp 2013 - Florin Coros - Driving Your Team Towards Code Quality
 
Neotys PAC - Stijn Schepers
Neotys PAC - Stijn SchepersNeotys PAC - Stijn Schepers
Neotys PAC - Stijn Schepers
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
 
Code instrumentation
Code instrumentationCode instrumentation
Code instrumentation
 
Chaos Engineering Talk at DevOps Days Austin
Chaos Engineering Talk at DevOps Days AustinChaos Engineering Talk at DevOps Days Austin
Chaos Engineering Talk at DevOps Days Austin
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
 
"We are doing it wrong."
"We are doing it wrong.""We are doing it wrong."
"We are doing it wrong."
 
Troubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep LearningTroubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep Learning
 
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
 
Strata CA 2019: From Jupyter to Production Manu Mukerji
Strata CA 2019: From Jupyter to Production Manu MukerjiStrata CA 2019: From Jupyter to Production Manu Mukerji
Strata CA 2019: From Jupyter to Production Manu Mukerji
 
30 days or less: New Features to Production
30 days or less: New Features to Production30 days or less: New Features to Production
30 days or less: New Features to Production
 
assertYourself - Breaking the Theories and Assumptions of Unit Testing in Flex
assertYourself - Breaking the Theories and Assumptions of Unit Testing in FlexassertYourself - Breaking the Theories and Assumptions of Unit Testing in Flex
assertYourself - Breaking the Theories and Assumptions of Unit Testing in Flex
 
2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire
 

Kürzlich hochgeladen

VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024VictoriaMetrics
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in sowetomasabamasaba
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...masabamasaba
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...chiefasafspells
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxAnnaArtyushina1
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Hararemasabamasaba
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 

Kürzlich hochgeladen (20)

VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 

The Art Of Performance Tuning - with presenter notes!

  • 1. the art of performance tuning Jonathan Ross Hello there, readers at home! These are the slides I used for my “Art of Performance Tuning” talk at Java One 2017. My apologies for not sharing them as a power point file - I had some technical difficulties. My slides (done in Apple’s Keynote app) are using the XKCD font, and there is no way of embedding the font using the Mac version of Power point (and most people don’t have Keynote installed either.) Ah well, at least these presenter notes are good for something. The code for the demo is available at https://github.com/JoroRoss/art-of-performance.
  • 2. the art of performance tuning Jonathan Ross science engineering Of course, the title of the talk is all wrong. The original talk was called “The art of Java Performance”, but ‘Java’ seemed a bit redundant at a JAVA conference. We’re not tuning the JVM as such (this is not a GC tuning talk), so ‘engineering’ is a better match. Finally performance engineers should follow the scientific method, so ‘science’ is a better choice of word than ‘art’.
  • 3. trading company big in options and futures based in Amsterdam imc Trading I’ve worked at IMC for 19 years. IMC is a proprietary trading firm founded in Amsterdam some 25+ years ago. We’re big in HFT.
  • 4. me Playing with JVMs since 1997 Paid to do so by IMC Chicago Theoretical physics background CJUG board member JCP associate member @JoroRoss As for me, I’ve been messing around with JVMs for quite some time. A lot of my work at IMC is in quantitative engineering, writing numerical algorithms and the likes. But I also spend plenty of time on architecture and performance engineering.
  • 5. it’s too slow Why this talk? In my work, I’m often asked to look at performance issues by other developers.
  • 6. have you tried profiling it? it’s too slow When I ask what they have done to investigate so far, I am surprised by the lack of a plan-of-attack. In particular the lack of measurements and the tendency to try changes on a hunch stick out. In my experience too many developers have little to no experience with micro benchmarks or profilers.
  • 7. this talk Part 1: theory • challenges • methodology • implementation Part 2: practice • how to use a profiler • hands-on jmh benchmarks “Theory” in the sense of “not practice” In this talk, I’ll present an approach to measuring and monitoring performance, and give you some hands-on demonstrations of micro-benchmarking and profiling. Show of hands: who regularly uses a profiler? JMH? Target audience: if your name is Martin Thompson or Kirk Pepperdine, you probably won’t learn anything new during the next 45 minutes. This talk is for the rest of us. Disclaimer: while it is a very important part of java performance tuning…
  • 8. this talk is not about gc tuning Part 1: theory • challenges • methodology • implementation Part 2: practice • how to use a profiler • hands-on jmh benchmarks …this talk is not about GC tuning. Or at least, it will not focus on it. For argument’s sake, lets say that you have already seen all of Gil and Kirks talks and your allocation charts are flat lining and your par-new collects are measured in the milli-seconds rather than the seconds. Or, perhaps more realistically, that after some tuning, you have determined that it is no longer your number one bottleneck.
  • 10. challenges objects modern cpus lack of determinism virtualization tooling ignorance confirmation bias premature optimisation over-engineering legacy Technical and non-technical challenges to Java performance engineering. technical side: - Everything is an object - pointer chasing, no fine grained-control over memory layout - Modern hardware - this ain’t your grandfather’s Turing machine - Lack of determinism: Garbage collector, JIT, (interaction with) other processes (!) - Virtualization - illusions of the JVM - it’s hard to strip away the layers of indirection and find out what is actually going on (also/ in particular for the tools) - Tooling (or a lack thereof) - tools suffer from the same technical difficulties. Biases.
  • 11. challenges ignorance confirmation bias premature optimisation over-engineering legacy objects modern cpus lack of determinism virtualization tooling Human/organizational challenges: - ignorance: probably the easiest to overcome (google/slashdot :P). - confirmation bias: something I often catch myself out on. When you think you’ve found a bottleneck and have sunk time into a fix, it can be hard to reject it. “To assume is to make an ass of you and me.” - premature optimization - 80/20 rule (Pareto principle) - “roughly 80% of the effects come from 20% of the causes” (more like 1% in java perf?) Problem with optimization - trade-off: performance vs. maintainability/legibility - ‘over-engineering’ (see also legacy) - hard to refactor bells and whistles - legacy: legacy code, legacy ways of doing things.
  • 12. Methodology find a (proxy) metric for implement automate “success” a way of measuring it a way of monitoring it This is really the scientific method in disguise. Monitoring is a CS/accountability twist. This methodology holds for any engineering task, not just java performance engineering.
  • 13. implementation make it easy to find root causes using
 
 
 fix and add regression tests using learn to use and understand 
 
 reproduce issues production metrics
 macro benchmarks micro benchmarks
 profilers
 Repeatability/reproducibility is key. This is best done if you build these systems into your architecture from the get-go, for instance as an event-sourced architecture. Chances are you are dealing with legacy systems… Well, you can always profile in production…
  • 14. implementation make it easy to find root causes using
 
 
 fix and add regression tests using learn to use and understand if the going gets tough
 
 reproduce issues production metrics
 macro benchmarks micro benchmarks
 profilers
 -XX+UnlockDiagnosticVMOptions
 -XX:+LogCompilation
 -XX:+PrintAssembly When the going gets tough (which is not that often – most performance issues I investigate tend to be solvable without diving too deep), you can do JIT compilation and use tools like JitWatch. Perhaps good to point out that you should also look beyond the JVM, indeed you should always start your investigations outside the JVM, and look at your system as a whole. What else is going on? Are there competing processes? Is the host healthy? Etc. etc.
  • 15. handy-dandy flowchart production metrics reproduce in benchmark profile implement fix metrics better? ditch fix nope yup good enough? nope merge and verify in production yup ? find bottleneck? yes sir Try to reproduce production metrics in a repeatable benchmark. This is quite easy to achieve in event-sourced systems, but it is not too hard to retrofit legacy systems with facilities for replaying production scenarios in a loop. Regarding the ‘ditch fitch’ box: beware of the sunk cost fallacy and confirmation biases!
  • 16. handy-dandy flowchart production metrics reproduce in benchmark profile implement fix metrics better? ditch fix nope yup good enough? nope merge and verify in production yup ? find bottleneck? yes sir• try harder • rearchitect • seek new job erm… Of course there was a choice missing in the flow chart. If you run out of low-hanging fruit, you face some more formidable challenges, often requiring a lot of rethinking rearchitecting.
  • 17. handy-dandy flowchart production metrics reproduce in benchmark profile implement fix metrics better? ditch fix nope yup good enough? nope merge and verify in production yup ? find bottleneck? yes sir 1. Only work on improving performance of actual bottle-necks 2. Don’t keep a fix unless it improves performance (or makes the code better) the main take-away:
  • 19. A STORY OF ATAN a (redacted) real world performance
 tuning adventure involving my love for 
 math and some short-sightedness it’s too slow let me try to profile it For those of you reading along from home, the second part of this talk covers a production performance regression that I investigated at IMC a couple of months ago. We start off with some pre-recorded screenshots of a Yourkit profiling session of a (redacted to keep some proprietary information… proprietary), but then moves on to a live demo of profiling a JMH benchmark using VisualVM and JMC. The code for the demo is available at https://github.com/JoroRoss/art-of-performance.
  • 20. To profile a production system, start your server with the JVM argument:
 -agentpath:<profiler directory>/bin/linux-x86-64/libyjpagent.so stack telemetry (Yourkit) This example is using Yourkit - but most of the functionality I am going to show here is also available in visualvm and jmc (albeit not quite as user friendly). Stack telemetry is first view I check in yourkit - get a good visual feel of what the app is doing. If you know a program well, this can be a very good way of seeing if it’s behaving normally.
  • 21. weeding out irrelevant data (tip: turn off filters by default) I like to turn off all filters in my profiler. As a result, things like IO show up. Not what we’re interested in here. Exclude!
  • 22. export threads were not supposed to be this busy… What-if: yourkit’s drill-down is awesome. (But shark was better!)
  • 23. ordering callees by “own time” shows us the bottleneck Now we see the export threads at the top of the list. Selecting one, we can see info about all callees in the lower panel. I like to sort this by ‘own time’.
  • 24. where is the bottleneck called from? select method back-traces Holy moly, we’re spending 50% of the time in the export threads in StrictMath.atan Okay, so who is calling this method? Many ways to skin a cat, here’s one of them.
  • 25. (wouldn’t you like to know!) 2 bottlenecks for the price of one Bonus! atan was not just a bottleneck in one branch. The redaction is didactic - no prejudices regarding what the program is doing! The bottleneck is clear though - the ‘export’ and ‘model’ methods are both calling the one above it (called ‘price’). Let’s focus on it.
  • 26. my favorite feature of Yourkit: focusing on sub-set of data
  • 27. eureka! next favorite feature: merged callees This ‘What-if’ tab has reduced the stack frames it is considering to the ones in which our ‘PricingService.price’ method is being called. Merged callees: call tree of the highlighted method, from all stack frames. Really nice this, especially for recursive calls.
  • 28. To the micro benchmarks! (demo) This part of the talk switches to some live coding an profiling (screenshot added for the benefit of people following along from home.) JMC is introduced (Java Microbenchmark Harness by Aleksey Shipilev which is part of the OpenJDK project). Open the class FoobaratorBenchmark, Run benchmark -> ~ 2150 us Create and run profiler benchmark (copy benchmark or edit existing one, use 100 iterations) set to fork 0 or 1 times. Go to JVisualVM, start sample, wait, stop, save snapshot Go to hotspot tab -> atan -> find in call tree we are in business! Atan is >50% of benchmark find call site invoking it Navigate to NormalizedAtan in IDE
  • 29. normalizedAtan = 2 ⇡ tan 1 ⇡ 2 x The normalized arctangent function is being used in the algorithm as a smooth range-limiting function. it’s linear near the origin with a slope of 1, and goes to +-1 asymptotically. The algorithm just needs this behaviour, but it doesn’t need the IEEE accuracy of SafeMath.atan!
  • 30. tan 1 (x) ⇡ ⇡(0.596227x + x2 ) 2(1 + 1.192524x + x2) ApproximateAtan is a rational function approximation of atan, the normalized version is accurate to within about 0.0018 rad— this is the sort of trigonometry optimization that is very common in the gaming industry, and avoids the expense of computing the arctangent to IEEE precision.
  • 31. Back To the micro benchmarks! (demo) (Screenshot added for the benefit of people following along from home.) The demo continues: Run AtanBenchmark. The rational version is ~30 times faster! Benchmark Mode Cnt Score Error Units AtanBenchmark.approxAtan thrpt 200 348.790 ± 19.346 ops/us AtanBenchmark.atan thrpt 200 11.119 ± 0.349 ops/us Switch FoobarCalculator to use ApproxmiateAtan.normalizedAtan Run benchmark - it’s actually not much faster (about 8%?) Benchmark Mode Cnt Score Error Units FoobaratorBenchmark.foobar avgt 200 1722.974 ± 27.549 us/op Our profiler was lying to us! (Safe point biasing?)
  • 32. Benchmark Mode Cnt Score Error Units FoobaratorBenchmark.benchmark avgt 100 2164.391 ± 49.520 us/op original Benchmark Mode Cnt Score Error Units FoobaratorBenchmark.benchmark avgt 100 2043.580 ± 80.292 us/op approximate, outside loop (6%) Benchmark Mode Cnt Score Error Units FoobaratorBenchmark.benchmark avgt 100 1991.929 ± 50.966 us/op approximate (8% speedup) Benchmark Mode Cnt Score Error Units FoobaratorBenchmark.benchmark avgt 100 2045.587 ± 55.951 us/op precise, outside loop (6%) The benchmarks I’m running in the demo are running a bit too quickly to provide statistically reliable results (longer = better in benchmarking and profiling). This slide shows results for the various stages of the demo run with a few more iterations.
  • 33. Java mission control (demo) (Screenshot added for the benefit of people following along from home.) Run FlightrecorderBenchmark or AutoJFRBenchmark Open jmc, make flight recording look at memory tab -> allocating a lot of Guava Row objects (also lambdas) In the rest of the demo, I check out some other branches of the codebase where I have put in fixes for the memory allocation pressure (a map of maps, avoiding using boxed doubles, avoiding using capturing lambdas - in a hot loop at least). JFR/JMC is also lying to us - it doesn’t report samples for native methods. Most performance gains in this demo come from reducing the allocation pressure, others come from hoisting expensive computations out of loops/avoiding duplicate calculations. The final solution presented is not pretty, but it solves the allocation pressure issue. The best solution for this particular algorithm would probably be to restructure the input data to be pre- partitioned, and/or use some primitive maps like Eclipse collections. Allocation pressure leads to more than just keeping the garbage collector busy: it also makes it very hard to use the cache lines
  • 34. Benchmark Mode Cnt Score Error Units FoobaratorBenchmark.benchmark avgt 100 2393.484 ± 111.872 us/op original Benchmark Mode Cnt Score Error Units FoobaratorBenchmark.benchmark avgt 100 1163.414 ± 17.417 us/op map of maps (52%) Benchmark Mode Cnt Score Error Units FoobaratorBenchmark.benchmark avgt 100 660.802 ± 5.795 us/op mutable double (72%) Benchmark Mode Cnt Score Error Units FoobaratorBenchmark.benchmark avgt 100 894.116 ± 9.971 us/op map of maps (2) (63%) Benchmark Mode Cnt Score Error Units FoobaratorBenchmark.benchmark avgt 100 547.549 ± 4.122 us/op atan outside loop again (77%) This slide shows results for the various stages of the JMC part of the demo run with a few more iterations.
  • 35. challenges revisited ignorance confirmation bias premature optimisation overengineering legacy objects modern cpus lack of determinism virtualization tooling We just saw most of these challenges. The last three organizational ones are tough nuts to crack. The 80-20 rule tells us not to worry too much about performance until we’ve found an actual bottleneck, but we saw that over-engineering and legacy codebases can make it pretty hard to deal with the actual pain points. My experience, is that if one sticks to clean code, guided by well-chosen architectural rules of thumb, you can’t go too far wrong. In practice, this means functional programming, and adhering to the ‘tell don’t ask’ mantra. Best practices ≠ premature optimization. I hope you found it interesting to hear some of my war stories.
  • 36. ? @JoroRoss www.imc.com www.cjug.org http://openjdk.java.net/projects/code-tools/jmh/
 http://hirt.se - Marcus Hirt’s blog on JMC/JFR
 https://visualvm.github.io The demo code is available at
 https://github.com/JoroRoss/art-of-performance More information on the tools I used in the demo. Questions?