English version of the presentation we gave at Devoxx FR 2012.
In depth analysis on how java Garbage collector works and how to minimise pause in your application.
1. Death by pauses Everything you ever wanted to know about GC pauses*
*but were afraid to ask
1
Tuesday, July 10, 12
2. Agenda
1. Introduction
2. Crime Scene Investigation
3. JVM Memory management systems and tools
4. Putting it together
2
Tuesday, July 10, 12
3. The Crime Scene
PG 13*
* Parents strongly cautioned: typed language, dead objects and verbose logs may not be suitable to scripting language fans
3
Tuesday, July 10, 12
6. B2C
e-commerce platform
Apache
•12+ Servers
Tomcat
•10 different Webapps
•50+ JVMs (Oracle JDK6)
•> 30000 sessions
•250-400 Req/s
Oracle
•Variance is high
4
Tuesday, July 10, 12
7. ... an unusual victim...
Product catalog modeled as a Graph
100% custom implementation
100% on-heap (no SQL except for initial load)
in-place update by AtomicReference.set()
5
Tuesday, July 10, 12
8. ... an unusual victim...
Product catalog modeled as a Graph
100% custom implementation
100% on-heap (no SQL except for initial load)
in-place update by AtomicReference.set()
Caching aggressively is not possible
Large number of request-scoped objects
Many WS into backoffice systems = latency
5
Tuesday, July 10, 12
24. The usual suspects...
• OutOfMemory Heap
• OutOfMemory PermGen
12
Tuesday, July 10, 12
25. The usual suspects...
• OutOfMemory Heap
• OutOfMemory PermGen
• Long GC pauses
12
Tuesday, July 10, 12
26. The usual suspects...
• OutOfMemory Heap
• OutOfMemory PermGen
• Long GC pauses
➡ under high load = immediate death
12
Tuesday, July 10, 12
27. The usual suspects...
• OutOfMemory Heap
• OutOfMemory PermGen
• Long GC pauses
➡ under high load = immediate death
12
Tuesday, July 10, 12
28. The usual suspects...
• OutOfMemory Heap
• OutOfMemory PermGen
• Long GC pauses
➡ under high load = immediate death
12
Tuesday, July 10, 12
29. a
e st h b yThe usual suspects...
•
• D e
OutOfMemory Heap
s
OutOfMemory PermGen
•
p au
Long GC pauses
➡ under high load = immediate death
12
Tuesday, July 10, 12
30. Why do we need this GC thing again ?
“Many concurrent algorithms are very easy to
write with a GC and totally hard (to down right
impossible) using explicit free.”
Cliff Click
13
Tuesday, July 10, 12
31. Fine, we just need to tune the JVM, right?...
14
Tuesday, July 10, 12
32. Fine, we just need to tune the JVM, right?...
14
Tuesday, July 10, 12
33. Fine, we just need to tune the JVM, right?...
POP QUIZZ!
Number of command-line flags*?
* Oracle JVM 1.6.0_31 x86_64 server
14
Tuesday, July 10, 12
34. Fine, we just need to tune the JVM, right?...
POP QUIZZ!
Number of command-line flags*?
less than 100 flags
* Oracle JVM 1.6.0_31 x86_64 server
14
Tuesday, July 10, 12
35. Fine, we just need to tune the JVM, right?...
POP QUIZZ!
Number of command-line flags*?
less than 100 flags
100 <= X< 200
* Oracle JVM 1.6.0_31 x86_64 server
14
Tuesday, July 10, 12
36. Fine, we just need to tune the JVM, right?...
POP QUIZZ!
Number of command-line flags*?
less than 100 flags
100 <= X< 200
200 <= X< 300
* Oracle JVM 1.6.0_31 x86_64 server
14
Tuesday, July 10, 12
37. Fine, we just need to tune the JVM, right?...
POP QUIZZ!
Number of command-line flags*?
less than 100 flags
100 <= X< 200
200 <= X< 300
300 <= X< 400
* Oracle JVM 1.6.0_31 x86_64 server
14
Tuesday, July 10, 12
38. Fine, we just need to tune the JVM, right?...
POP QUIZZ!
Number of command-line flags*?
less than 100 flags
100 <= X< 200
200 <= X< 300
300 <= X< 400
400 <= X< 500
* Oracle JVM 1.6.0_31 x86_64 server
14
Tuesday, July 10, 12
39. Fine, we just need to tune the JVM, right?...
POP QUIZZ!
Number of command-line flags*?
less than 100 flags
100 <= X< 200
200 <= X< 300
300 <= X< 400
400 <= X< 500
500 <= X< 600
* Oracle JVM 1.6.0_31 x86_64 server
14
Tuesday, July 10, 12
40. Fine, we just need to tune the JVM, right?...
POP QUIZZ!
Number of command-line flags*?
less than 100 flags
100 <= X< 200
200 <= X< 300
300 <= X< 400
400 <= X< 500
500 <= X< 600
600 <= X< 700
* Oracle JVM 1.6.0_31 x86_64 server
14
Tuesday, July 10, 12
41. Fine, we just need to tune the JVM, right?...
POP QUIZZ!
Number of command-line flags*?
less than 100 flags
100 <= X< 200
200 <= X< 300
300 <= X< 400
400 <= X< 500
500 <= X< 600
600 <= X< 700 664 Flags!
* Oracle JVM 1.6.0_31 x86_64 server
14
Tuesday, July 10, 12
72. Garbage Collectors
• Générational
• Stop the world!
• Throughput or Concurrent
44
Tuesday, July 10, 12
73. GC characteristics
Young
Serial Parallel
Serial
Old Parallel
Concurrent
45
Tuesday, July 10, 12
74. GC characteristics
Young
Serial Parallel
Serial Default
Old Parallel N/A
Concurrent
46
Tuesday, July 10, 12
75. GC characteristics
Young
Serial Parallel
Serial
Old Parallel
Concurrent
47
Tuesday, July 10, 12
76. GC characteristics
Young
Serial Parallel
Serial Serial
Old Parallel
Concurrent
47
Tuesday, July 10, 12
77. GC characteristics
Young
Serial Parallel
Serial Serial Parallel
Old Parallel
Concurrent
47
Tuesday, July 10, 12
78. GC characteristics
Young
Serial Parallel
Serial Serial Parallel
Old Parallel ParallelOld
Concurrent
47
Tuesday, July 10, 12
79. GC characteristics
Young
Serial Parallel
Serial Serial Parallel
Old Parallel ParallelOld
Concurrent CMS
47
Tuesday, July 10, 12
80. GC characteristics
Young
Serial Parallel
Serial Serial Parallel
Old Parallel ParallelOld
Concurrent CMS Serial CMS
47
Tuesday, July 10, 12
81. GC characteristics
Young
Serial Parallel
Serial Serial Parallel
Old Parallel ParallelOld
Concurrent CMS Serial CMS
Parallel implementation actually differ for each variant
48
Tuesday, July 10, 12
84. CMS is the right choice
Serial 917
Parallel 852
ParallelOld 846
CMS 871
CMS Serial 937
0 250 500 750 1000
Average test duration (s)
50
Tuesday, July 10, 12
85. Tools: CLI
jps, jhat, jmap, jstack, jstat
$ jstat -gcutil PID
S0 S1 E O P YGC YGCT FGC FGCT GCT
0.00 40.88 58.41 18.34 66.65 2729 316.538 46 6.820 323.358
51
Tuesday, July 10, 12
99. OK, so we can measure... temperature!!
56
Tuesday, July 10, 12
100. OK, so we can measure... temperature !
=
57
Tuesday, July 10, 12
101. But...a single temperature measure is not enough to diagnose anything!
We must archive all measurements
to know the baseline!
Credit: http://www.lhup.edu/mkhalequ/fieldtrip/geos253.htm
58
Tuesday, July 10, 12
102. Therefore we must persist all measurements!
• JMX + jmxtrans
• RRD
• Graphite
• etc.
59
Tuesday, July 10, 12
103. Operating the (many) switches only makes sense...
Credit: http://www.our-energy.com
60
Tuesday, July 10, 12
104. ...if we can measure/compare the effects!
Before
cputime
After
61
Tuesday, July 10, 12
108. JVM
Tomcat
vs.
Application
(code)
64
Tuesday, July 10, 12
109. 1. Code
• Tuning the JVM cannot compensate for bad code
• Rules of thumb
• Immutability = object reuse = less allocations *
• Move code invariants out of tight loops
• Know the characteristics of your data structures & frameworks (java.util,
Guava, Hibernate, etc.)
• Mind the gap: data structure overhead can kill you!
* But...pooling can be counter-productive!
65
Tuesday, July 10, 12
110. Example : HashMap
HashMap 48
Entry[16] 80 key
Entry 32
value
66
Tuesday, July 10, 12
111. Example : HashMap
HashMap 48
Entry[16] 80 key
Overhead = 160 Bytes! Entry 32
value
66
Tuesday, July 10, 12
112. Example : HashMap
HashMap 48
Entry[16] 80 key
Overhead = 160 Bytes! Entry 32
value
•SingletonMap (40 Bytes)
•initialCapacity + loadFactor
66
Tuesday, July 10, 12
113. GC Young / s Less allocations...
67
Tuesday, July 10, 12
114. Charge CPU ... saves CPU
68
Tuesday, July 10, 12
115. 2. Tomcat
• Pooling
• JSP tags: enablePooling in web/webdefault.xml
• -Dorg.apache.jasper.runtime.JspFactoryImpl.USE_POOL=false
• Careful with buffers and their reuse
• -Dorg.apache.jasper.runtime.BodyContentImpl.LIMIT_BUFFER=true
• JSP usage is a factor in PermGen requirements
• Test & Measure, always!
69
Tuesday, July 10, 12
116. 2. Tomcat
• Pooling
Pooling may lead
• JSP tags: enablePooling in web/webdefault.xml
!
to Old
fragmentation!
• -Dorg.apache.jasper.runtime.JspFactoryImpl.USE_POOL=false
• Careful with buffers and their reuse
• -Dorg.apache.jasper.runtime.BodyContentImpl.LIMIT_BUFFER=true
• JSP usage is a factor in PermGen requirements
• Test & Measure, always!
69
Tuesday, July 10, 12
117. 3. Heap Size The JVM
Time
Heap Size
Time
70
Tuesday, July 10, 12
118. 3. The JVM
pause > 1s !
Heap Size
Time
Heap Size
Time
70
Tuesday, July 10, 12
119. 3. The JVM
pause > 1s !
Heap Size
Time
Heap Size
Frequent GC
Time
70
Tuesday, July 10, 12
126. First mistake: setting the Young too small
Old
Young fills up quickly = many GC Young
Objects promoted to Tenured too fast =
Young many GC Old
73
Tuesday, July 10, 12
136. Old generation : ideal vs. real
Rate increases
79
Tuesday, July 10, 12
137. Things to watch for
• Traffic/Load variance
• Traffic increases => Memory pressure increase
• CMS requires some headroom to operate properly
• Several phases are concurrent, i.e. at the same time as new objects are
allocated
80
Tuesday, July 10, 12
138. Things to watch for
• Traffic/Load variance
• Traffic increases => Memory pressure increase
• CMS requires some headroom to operate properly
• Several phases are concurrent, i.e. at the same time as new objects are
allocated
(concurrent mode failure): 2165740K->1284261K(2228224K), 8.9411250 secs
80
Tuesday, July 10, 12
139. Giving CMS some room to operate
Old
Young
81
Tuesday, July 10, 12
140. Giving CMS some room to operate
CMSInitiatingOccupancyFraction = 92%
This is the default....
Old
Young
81
Tuesday, July 10, 12
141. Giving CMS some room to operate
We really need 75-80%
UseCMSInitiatingOccupancyOnly to force
the JVM to only consider this criteria
Old
Young
81
Tuesday, July 10, 12
153. What’s next?
• Survivors tuning (S0 & S1)
• Size, ratio vs. Eden, max generation
• G1
• Principles and operations are radically different!
• Other JVMs : JRockit, Azul, IBM
• Check tuning validity after every code change!
• Measure, measure, measure!
89
Tuesday, July 10, 12