Grails has great performance characteristics but as with all full stack frameworks, attention must be paid to optimize performance. In this talk Lari will discuss common missteps that can easily be avoided and share tips and tricks which help profile and tune Grails applications.
2. "Programmers waste enormous amounts of time thinking
about, or worrying about, the speed of noncritical parts
of their programs, and these attempts at efficiency
actually have a strong negative impact when debugging
and maintenance are considered. We should forget
about small efficiencies, say about 97% of the time:
premature optimization is the root of all evil. Yet we
should not pass up our opportunities in that critical 3%."
- Donald Knuth, 1974
2
3. Mature performance optimisation
• Find out the quality requirements of your solution. Keep
learning about them and keep them up-to-date. It's a
moving target.
• Keep up the clarity and the consistency of your solution.
• Don't introduce accidental complexity.
• Don't do things just "because this is faster" or
someone thinks so.
• Start doing mature performance tuning and optimisation
today! 3
9. Lari's Grails Performance Tuning Method™
• Look for 3 things:
• Slow database operations - use a profiler that shows SQL
statements
• Thread blocking - shows up as high object monitor usage
in the profiler
• Exceptions used in normal program flow - easy to check
in profiler
• Pick the low hanging fruits
• Find the most limiting bottleneck and eliminate it
• Iterate
9
11. What's the goal of performance tuning?
• The primary goal of performance tuning is to assist in
fulfilling the quality requirements and constraints of your
system.
• Meeting the quality requirements makes you and your
stakeholders happy: your customers, your business
owners, and you the dev&ops.
11
13. Operational efficiency
• Tuning your system to meet it's quality requirements
with optimal cost
• Optimising costs to run your system - operational
efficiency
13
15. Performance tuning improvement cycle
• Measure & profile
o start with the tools you have
available. You can add more tools
and methods in the next iteration.
• Think & learn, analyse and plan
the next change
o find tools and methods to
measure something in the next
iteration you want to know about
more
15
16. Iterate, Iterate, Iterate
• Iterate: do a lot of iterations and change one thing at a
time
• learn gradually about your system's performance and
operational aspects
16
17. Feedback from production
• Set up a different feedback cycle for production
environments.
• Don't forget that usually it's irrelevant if the system
performs well on your laptop.
• If you are not involved in operations, use innovative
means to set up a feedback cycle.
17
19. If your requirement is to lower latency
• Amdahl's law - you won't be able to effectively speed up a
single computation task if you cannot parallellize it.
• In an ordinary synchronous blocking Servlet API
programming model, you have to make sure that the use
of shared locks and resources is minimised.
• Reducing thread blocking (object monitor usage) is a key
principle for improving performance - Amdahl's law
explains why.
• The ideal is lock free request handling when synchronous
Servlet API is used.
19
20. Understand Little's law in your context
• With Little's law you can do calculations and reasoning
about programming models that fit your requirements
and available resources
• the traditional Servlet API thread-per-request model
could fit your requirements and you can still make it
"fast" (low latency) in most cases.
20
21. Cons of the thread-per-request model in the light of Little's law and
Amdahl's law
• From Little's law: MeanNumberInSystem =
MeanThroughput * MeanResponseTime
• In the thread-per-request model, the upper bound for
MeanNumberInSystem is the maximum for the number of
request handling threads. This might limit the throughput of
the system, especially when the response time get higher
or request handling threads get blocked and hang.
• Shared locks and resources might set the upper bound to
a very low value. Such problems get worse under error
conditions.
21
22. Advantages of thread-per-request model
• We are used to debugging the thread-per-request model
- adding breakpoints, attaching the debugger and going
through the stack
• The synchronous blocking procedural programming
model is something that programmers are used to doing.
• There is friction in switching to different programming
models and paradigms.
22
23. KillerApp for non-blocking async model
• Responsive streaming of a high number of clients on a
single box
• continuously connected real-time apps where low-latency
and high availablity is a requirement
• limited resources (must be efficient/optimal)
23
25. JVM code profiler concepts
• Sampling
• statistical ways to get information about the execution using JVM
profiling interfaces with a given time interval, for example 100
milliseconds. Statistical methods are used to calculate values based
on the samples.
o Unreliable results, but certainly useful in some cases since the
overhead of sampling is minimal compared to instrumentation
o Usually helps to get better understanding of the problem if you learn
to look past the numeric values returned from measurements.
• Instrumentation
o exact measurements of method execution details 25
26. Load testing tools and services
• Simple command line tools
• wrk https://github.com/wg/wrk
• modern HTTP benchmarking tool
o has lua scripting support for doing things like
verifying the reply
• Load testing toolkits and service providers
• Support testing of full use cases and stateful flows
• toolkits: JMeter (http://jmeter.apache.org/),
Gatling (http://gatling.io/) 26
27. Common pitfalls in profiling Grails
• Measuring wall clock time
• Measuring CPU time
• Instrumentation usually provides false results because
of JIT compilation and other reasons like spin locks
• lack of proper JVM warmup
• Relying on gut feeling and being lazy
27
28. Ground your feet
• Find a way to review production performance graphs regularly,
especially after making changes to the system
• system utilisation over time (CPU load, IO load & wait, Memory
usage), system input workload (requests) over time, etc.
• In the Cloud, use tools like New Relic to get a view in operations
• CloudFoundry based Pivotal Web Services and IBM Bluemix
have New Relic available
• In the development environment, use a profiler and debugger to
get understanding. You can use grails-melody plugin to get
insight on SQL that's executed. 28
29. Grails - The low hanging fruit
• Improper JVM config
• Slow SQL
• Blocking caused by caching
• Bad regexps
• Unnecessary database transactions
• Watch out for blocking in the Java API: Hashtable
29
30. Environment related problems
• Improper JVM configuration for Grails apps
• out-of-the-box Tomcat parameters
• a single JVM running with a huge heap on a big box
o If you have a big powerful box, it's better to run
multiple small JVMs and put a load balancer in front
of them
30
31. Example of proper Tomcat config for *nix
31
Create a file setenv.sh in tomcat_home/bin directory:
1 export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_60
2 export LC_ALL=en_US.UTF-8
3 export LANG=en_US.UTF-8
4 CATALINA_OPTS="$CATALINA_OPTS -server -noverify"
5 CATALINA_OPTS="$CATALINA_OPTS -XX:MaxPermSize=256M -Xms768M -Xmx768M" # tune heap
size
6 CATALINA_OPTS="$CATALINA_OPTS -Djava.net.preferIPv4Stack=true" # disable IPv6 if not used
7 # set default file encoding and locale
8 CATALINA_OPTS="$CATALINA_OPTS -Dfile.encoding=UTF-8 -Duser.language=en -
Duser.country=US"
9 CATALINA_OPTS="$CATALINA_OPTS -Duser.timezone=CST" # set default timezone
10 CATALINA_OPTS="$CATALINA_OPTS -Dgrails.env=production" # set grails environment
11 # set timeouts for JVM URL handler
12 CATALINA_OPTS="$CATALINA_OPTS -Dsun.net.client.defaultConnectTimeout=10000
-Dsun.net.client.defaultReadTimeout=10000"
13 CATALINA_OPTS="$CATALINA_OPTS -Duser.dir=$CATALINA_HOME" # set user.dir
14 export CATALINA_OPTS
15 export CATALINA_PID="$CATALINA_HOME/logs/tomcat.pid"
32. JVM heap size
• Assumption: optimising throughput and latency on the cost of
memory consumption
• set minimum and maximum heap size to the same value to
prevent compaction (that causes full GC)
• look at the presentation recording of the "Tuning Large scale
Java platforms" by Emad Benjamin and Jamie O'Meara for more.
• rule in the thumb recommendation for heap size: survivor space
size x 3...4 and don't exceed NUMA node's local memory size
for your server configuration (use: "numactl --hardware" to find
out Numa node size on Linux). 32
33. The most common problem: SQL
• SQL and database related bottlenecks: learn how to profile
SQL queries and tune your database queries and your
database
• grails-melody plugin can be used to spot costly SQL
queries in development and testing environments.
Nothing prevents use in production however there is a
risk that running it in production environment has
negative side effects.
• New Relic in CloudFoundry (works for production
environments)
33
34. Use a non-blocking cache implemention
• Guava LoadingCache is a good candidate
https://code.google.com/p/guava-libraries/
wiki/CachesExplained
• "While the new value is loading the previous value (if any)
will continue to be returned by get(key) unless it is evicted.
If the new value is loaded successfully it will replace the
previous value in the cache; if an exception is thrown while
refreshing the previous value will remain, and the exception
will be logged and swallowed." (http://docs.guava-libraries.
googlecode.com/git- 34
35. Some regexps are CPU hogs
https://twitter.com/lhotari/status/474591343923449856 35
36.
37. Verify regexps against catastrophic backtracking
• Verify regexps that are used a lot
• use profiler's CPU time measurement to spot
• search for the code for candidate regexps
• Use a regexp analyser to check regexps with different input size
(jRegExAnalyser/RegexBuddy).
• Make sure valid input doesn't trigger "catastrophic backtracking".
• Understand what it is.
• http://www.regular-expressions.info/catastrophic.html
• "The solution is simple. When nesting repetition operators, make
absolutely sure that there is only one way to match the same
match"
37
38. Eliminate unnecessary database transactions in Grails
• should use "static transactional = false" in services that
don't need transactions
• Don't call transactional services from GSP taglibs (or
GSP views), that might cause a large number of short
transactions during view rendering
38
39. JDK has a lot of unnecessary blocking
• java.util.Hashtable/Properties is blocking
• these block:
System.getProperty("some.config.value","some.default
"), Boolean.getBoolean("some.feature.flag")
• Instantiation of PrintWriter, Locale, NumberFormats,
CurrencyFormats etc. , a lot of them has blocking
problems because System.getProperty calls.
• Consider monkey patching the JDK's Hashtable class:
https://github.com/stephenc/high-scale-lib 39
40. Misc Grails tips
• Use singleton scope in controllers
• grails.controllers.defaultScope = 'singleton'
• default for new apps for a long time, might be
problem for upgraded apps
• when changing, make sure that you previously didn't
use controller fields for request state handling (that
was ok for prototype scope)
• Use controller methods (replace closures with
methods in upgraded apps) 40
42. Simple inspection in production environments
• kill -3 <PID> or jstack <PID>
• Makes a thread dump of all threads and outputs it to
System.out which ends up in catalina.out in default
Tomcat config.
• the java process keeps running and it doesn't get
terminated
42
43. Java Mission Control & Flight Recorder
• Oracle JDK 7 and 8 includes Java Mission Control since
1.7.0_40 .
• JAVA_HOME/bin/jmc executable for launching the client
UI for jmc
• JMC includes Java Flight Recorder which has been
designed to be used in production.
• JFR can record data without the UI and store events in
a circular buffer for investigation of production
problems. 43
44. JFR isn't free
• JFR is a commercial non-free feature, available only in
Oracle JVMs (originally from JRockit).
• You must buy a license from Oracle for each JVM using
it.
• "... require Oracle Java SE Advanced or Oracle Java
SE Suite licenses for the computer running the
observed JVM" ,
http://www.oracle.com/technetwork/java/javase/docum
entation/java-se-product-editions-397069.pdf , page 544
45. Controlling JFR
• enabling JFR with default continuous "black box"
recording:
export _JAVA_OPTIONS="-XX:+UnlockCommercialFeatures
-XX:+FlightRecorder
-XX:FlightRecorderOptions=defaultrecording=true"
• Runtime controlling using jcmd commands
• help for commands with
45
jcmd <pid> help JFR.start
jcmd <pid> help JFR.stop
jcmd <pid> help JFR.dump
jcmd <pid> help JFR.check