One of the key strengths of JVM is automatic memory management (Garbage Collection). Its understanding can help in writing better applications. This becomes all the more important as enterprise server applications have large amount of live heap data and significant parallel threads. Until recently, main collectors were parallel collector and concurrent-mark-sweep (CMS) collector. This presentation introduces the various Garbage Collectors and compares the CMS collector against its replacement, a new implementation in Java7 i.e. G1. It is characterized by a single contiguous heap which is split into same-sized regions. In fact if your application is still running on the 1.5 or 1.6 JVM, a compelling argument to upgrade to Java 7 is to leverage G1.
5. Generational Hypothesis
• Most objects die young
• Only a few live very long
• Longer they live, more likely they live longer
• Old objects rarely reference young objects
8. CMS operations in Young Generation (i)
• Young Generation
• 1 Eden and 2 Survivor Spaces
• Old Generation
• Compacted only at Full GC
9. CMS operations in Young Generation (ii)
• Young Generation
Collection
• Stop the World Pause
• Live objects from young
generation moved to
• Other survivor space
• Old Generation
10. CMS operations in Young Generation (iii)
• After Young
Generation GC
• Eden and 1 Survivor
Space are empty
• Objects promoted to old
generation
11. CMS operations in Old Generation (i)
• Mark Phases
• Initial Mark (STW)
• Concurrent Mark
• Remark (STW)
12. CMS operations in Old Generation (ii)
• Concurrent Sweeping
Phase
• Collects objects
identified as
unreachable during
marking phases
• In-place de-allocation of
unreachable objects
13. CMS operations in Old Generation (iii)
• Resetting
• All unmarked objects
de-allocated
• Prepare for next
concurrent collection by
clearing data structures
14. CMS Challenges
• Stop the World Pause (Remark phase)
• Very Large Heaps
• Fragmentation
• Hard to tune
15. Introducing G1
• Concurrent
• Refinement, Marking, Cleanup
• Parallel
• STW Pauses
• Full GC is single threaded
• Compacting
16. G1 Goals
• Low Latency
• Better Predictability
• Easy to use & tune
• Move away from current situation of 3 different GC
frameworks
17. G1 Heap Overview
• Single large contiguous space divided into fixed size
regions (~ 2000)
• No physical separation between young and old generation
• Objects moved between regions during collections
• Humongous Regions for large objects
18. G1 - Young Generation GC
• Live objects evacuated (copied/moved) to
• One or more survivor regions
• Old regions
• STW Pause
• Done in parallel with multiple threads
• Eden size and survivor size calculated for next
young GC cycle
19. G1 - Old Generation GC
• Initial Marking Phase
• Piggybacked on Young Generation GC
• STW Pause
20. G1 - Old Generation GC
• Concurrent Marking Phase
• Calculates liveness information per region
• Empty regions can be reclaimed easily (denoted as X)
21. G1 - Old Generation GC
• Remark Phase
• Completes marking of live objects in heap
• Empty regions removed and reclaimed
• STW Pause
• Region liveness known for all other old generation
regions
22. G1 - Old Generation GC
• Copying/Cleanup Phase
• Select regions with low liveness
• Collect (some) during next Young GC
23. G1 Old Generation GC
• After Copying/Cleanup Phase
• Selected regions collected and compacted
• Some garbage objects may be left in old generation
regions
24. Summary - G1 Old Generation GC
• Concurrent Marking Phase
• Calculates liveness information per region, concurrently while
the application is running
• Identifies best regions for subsequent evacuation phases
• No corresponding sweeping phase
• Remark Phase
• Different marking algorithm than CMS
• Uses Snapshot-at-the-beginning (SATB) which is much
faster than what was being used in CMS
• Completely empty regions are reclaimed
• Copying/Cleanup Phase
• Young generation and Old generation reclaimed at the same
time
• Old generation regions selected based on their liveness
25. G1 and CMS Comparison
Features G1 GC CMS GC
Concurrent and Generational Yes Yes
Releases Max Heap memory after usage Yes No
Low Latency Yes Yes
Throughput Higher Lower
Compaction Yes No
Predictability More Less
Physical separation between Young and Old No Yes
26. Footprint Overhead
• For the same application size, as compared to CMS, the
heap size is likely to be larger in G1 due to additional
accounting data structures
• Remembered Sets (RSets / RSet)
• Track object references into a given region
• Footprint overhead less than 5%
• Caution
• More inter-region references => Bigger Remembered Set
• Large Remembered Set => Slow GC
• Collection Sets (CSets / CSet)
• Set of regions that will be collected in a GC
• Footprint overhead less than 1%
27. Command Line Options
• -XX:+UseG1GC
• Tells the JVM to use G1 Garbage Collector
• -XX:MaxGCPauseMillis=200
• Sets target for the maximum GC pause time
28. G1 GC Tuning Options (i)
• Main goal is latency
• If latency not a problem, then use Parallel GC
• Related goal is simplified tuning
• Most important tuning option
• XX:MaxGCPauseMillis=200 (default value = 200ms)
• Influences maximum amount of work per collection
• Best effort only
29. G1 GC Tuning Options (ii)
• -XX:InitiatingHeapOccupancyPercentage=n
• Trigger to start GC
• Percent of entire heap not just old generation
• -XX:G1OldCSetRegionLiveThresholdPercent=n
• Threshold for region to be included in a Collection Set
30. G1 GC Tuning Options (iii)
• -XX:G1MixedGCCountTarget=n
• How many Mixed GC / Concurrent Cycle
• Precaution
• Fixing young generation size (-Xmn) can cause
PauseTimeTarget to be ignored
• G1 no longer respects the pause time target
• Even if heap expands, the young generation size is
fixed
31. G1 Logging (i)
• Three different log levels
• Log level as fine – Use -verbosegc (equivalent to -XX:+PrintGC)
• Sample Output
[GC pause (G1 Humongous Allocation) (young) (initial-mark)
24M- >21M(64M), 0.2349730 secs] [GC pause (G1 Evacuation
Pause) (mixed) 66M->21M(236M), 0.1625268 secs]
• Log level as finer – Use -XX:+PrintGCDetails
• Average, Min, and Max time displayed for each phase
• Root Scan, RSet Updating (with processed buffers information), RSet
Scan, Object Copy, Termination (with number of attempts)
• Also shows “other” time such as time spent choosing CSet, reference
processing, reference enqueuing and freeing CSet
• Shows the Eden, Survivors and Total Heap occupancies.
• Sample Output
[Ext Root Scanning (ms): Avg: 1.7 Min: 0.0 Max: 3.7
Diff: 3.7] [Eden: 818M(818M)->0B(714M) Survivors: 0B-
>104M Heap: 836M(4096M)->409M(4096M)]
32. G1 Logging (ii)
• Log level as finest – Use -XX:+UnlockExperimentalVMOptions
-XX:G1LogLevel=finest
• Like finer but includes individual worker thread information.
• Sample Output
[Ext Root Scanning (ms): 2.1 2.4 2.0 0.0 Avg: 1.6 Min: 0.0
Max: 2.4 Diff: 2.3] [Update RS (ms): 0.4 0.2 0.4 0.0 Avg:
0.2 Min: 0.0 Max: 0.4 Diff: 0.4] [Processed Buffers : 5 1
10 0 Sum: 16, Avg: 4, Min: 0, Max: 10, Diff: 10]
• Determine Time – How time is displayed in GC logs
XX:+PrintGCTimeStamps - Shows the elapsed time since the JVM
started
1.729: [GC pause (young) 46M->35M(1332M), 0.0310029 secs]
-XX:+PrintGCDateStamps - Adds a time of day prefix to each entry
2012-05-02T11:16:32.057+0200: [GC pause (young) 46M-
>35M(1332M), 0.0317225 secs]
33. G1 Logging Keywords (i)
• Parallel Time - Overall elapsed time of the main parallel part of the pause
• Worker Start – Timestamp at which the workers start
• Note: The logs are ordered on thread id and are consistent on each entry
414.557: [GC pause (young), 0.03039600 secs] [Parallel Time: 22.9
ms] [GC Worker Start (ms): 7096.0 7096.0 7096.1 7096.1 706.1
7096.1 7096.1 7096.1 7096.2 7096.2 7096.2 7096.2 Avg: 7096.1,
Min: 7096.0, Max: 7096.2, Diff: 0.2]
• External Root Scanning - The time taken to scan the external root (e.g., things
like system dictionary that point into the heap.)
[Ext Root Scanning (ms): 3.1 3.4 3.4 3.0 4.2 2.0 3.6 3.2 3.4 7.7 3.7
4.4 Avg: 3.8, Min: 2.0, Max: 7.7, Diff: 5.7]
• Update Remembered Set - Buffers that are completed but have not yet been
processed by the concurrent refinement thread before the start of the pause
have to be updated.
• Time depends on density of the cards. The more cards, the longer it will take.
[Update RS (ms): 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Avg: 0.0, Min: 0.0, Max: 0.1, Diff: 0.1] [Processed Buffers : 26
0 0 0 0 0 0 0 0 0 0 0 Sum: 26, Avg: 2, Min: 0, Max: 26, Diff: 26]
34. G1 Logging Keywords (ii)
• Scanning Remembered Sets - Look for pointers that point into the Collection Set
[Scan RS (ms): 0.4 0.2 0.1 0.3 0.0 0.0 0.1 0.2 0.0 0.1 0.0 0.0 Avg: 0.1,
Min: 0.0, Max: 0.4, Diff: 0.3]F
• Object Copy - The time that each individual thread spent copying and evacuating
objects
[Object Copy (ms): 16.7 16.7 16.7 16.9 16.0 18.1 16.5 16.8 16.7 12.3 16.4
15.7 Avg: 16.3, Min: 12.3, Max: 18.1, Diff: 5.8]
• Termination Time - When a worker thread is finished with its particular set of objects to
copy and scan, it enters the termination protocol. It looks for work to steal and once it's
done with that work it again enters the termination protocol. Termination attempt
counts all the attempts to steal work.
[Termination (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Avg: 0.0,
Min: 0.0, Max: 0.0, Diff: 0.0] [Termination Attempts : 1 1 1 1 1 1 1 1 1
1 1 1 Sum: 12, Avg: 1, Min: 1, Max: 1, Diff: 0]
• GC Worker End
[GC Worker End (ms): 7116.4 7116.3 7116.4 7116.3 7116.4 7116.3 7116.4 7116.4
7116.4 7116.4 7116.3 7116.3 Avg: 7116.4, Min: 7116.3, Max: 7116.4, Diff:
0.1]
• GC worker end time – Timestamp when the individual GC worker stops.
• GC worker time – Time taken by individual GC worker thread.
35. G1 Logging Keywords (iii)
• GC Worker Other - The time (for each GC thread) that can't be attributed to
the worker phases listed previously. Should be quite low.
[GC Worker Other (ms): 2.6 2.6 2.7 2.7 2.7 2.7 2.7 2.8 2.8 2.8 2.8
2.8 Avg: 2.7, Min: 2.6, Max: 2.8, Diff: 0.2]
• Clear CT - Time taken to clear the card table of RSet scanning meta-data
[Clear CT: 0.6 ms]
• Other - Time taken for various other sequential phases of the GC pause.
[Other: 6.8 ms]
• CSet - Time taken finalizing the set of regions to collect. Usually very small;
slightly longer when having to select old
[Choose CSet: 0.1 ms]
• Ref Proc - Time spent processing soft, weak, etc. references deferred from the
prior phases of the GC.
[Ref Proc: 4.4 ms]
• Ref Enq - Time spent placing soft, weak, etc. references on to the pending list.
[Ref Enq: 0.1 ms]
• Free CSet - Time spent freeing the set of regions that have just been collected,
including their remembered sets [Free CSet: 2.0 ms]
36. G1 Evacuation Failure
• Promotion Failure when JVM runs out of heap regions
during the GC
• Indicated by “to-space overflow” in PrintGCDetails log
• Very expensive operation
37. Sample Application Test
• Sample Application
Create and add 190 Float Arrays into an
Array List
Each Float Array reserves 4MB of memory,
i.e. 1 x 1024 x 1024 = 4 MB
4 MB x 190 = 760 MB
After each iteration the arrays are
released and application sleeps for some
time
Same steps are repeated certain number of
times
38. Observations for CMS
• Command Line Arguments
java -server -XX:+UseConcMarkSweepGC -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps -Xloggc:CMS.log
-Dcom.sun.management.jmxremote.port=3333
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false -classpath
C:UsersgusachdeworkspaceMemorybin GCTest 190
• Observations with VisualVM
39. Observations for G1
Command Line Arguments
java -server -XX:+UseG1GC -XX:+PrintGCDetails -XX:
+PrintGCTimeStamps -Xloggc:G1GC.log
-Dcom.sun.management.jmxremote.port=3333
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false -classpath
C:UsersgusachdeworkspaceMemorybin GCTest 190
Observations with VisualVM
40. Results Comparison
• G1 GC is able to reclaim
max heap size
• CMS is not able to do so
• Lesser CPU utilization
for G1 collection
• G1 Heap goes to max
size in three distinct
jumps
• CMS seems to gain max
heap size in initial jump
Parameters G1 GC CMS GC
Time taken
for
execution
7 min 5 sec 7 min 56 sec
Max CPU
Usage
27.3% 70.2%
Max GC
Activity
2% 24%
Max Heap
Size
974 MB 974 MB
Max Used
Heap Size
763 MB 779 GB
41. Is G1 For You
• Evaluate all other options before moving to G1
• Don’t need Low Latency
• Use Parallel GC
• Don’t need big heap
• Use small heap and Parallel GC
• Need big heap
• Try CMS
• If CMS not performing well => Tune it
• If tuned CMS not performing well => Tune it further
• If problem still persists => Check whether you require such a
big heap and low pauses
• Start using G1
• Test before deploying in production
42. References
• JavaOne 2012 G1 Talk, Charlie Hunt, Monica
Beckwith
• http://www.oracle.com/webfolder/technetwork/tutoria
• Poonam Bajaj’s blog
• https://blogs.oracle.com/poonam/
• hotspot-gc-use mailing list
Split the heap into regions
Create new objects in Young Generation
Move mature objects to Old Generation
Different strategies for different regions
Serial Collector
Both Young and Old collections done serially
Parallel
Young generation collection done in parallel using multiple CPUs
CMS
Most of the garbage collection work done concurrently with the application threads.
G1
Supported since 7u4
Server style garbage collector, targeted for multiprocessor machines with large heap size
To replace CMS in the long term
Single large contiguous space divided into fixed size regions (~ 2000)
Region size chosen at startup (size 1 MB to 32 MB)
No physical separation between young and old generation
Not required to be contiguous
A region may act as either eden, survivor(s) or old generation
Objects moved between regions during collections
Humongous Regions for large objects
Multiple contiguous regions for large objects (> 50% region size)
Collection not optimized!
Collect (some) during next Young GC
Number of old regions collected depends on liveness information, predicted time to evacuate the space and pause time target
Some garbage objects may be left in old generation regions
Regions with high liveness
They may be collected later based on future liveness, pause time target and number of unused regions
Remembered Sets (RSets / RSet)
Track object references into a given region
One per region
Enables parallel and independent collection of a region
No need to track whole heap to find references
Footprint overhead less than 5%
Caution
More inter-region references => Bigger Remembered Set
Large Remembered Set => Slow GC
Collection Sets (CSets / CSet)
Set of regions that will be collected in a GC
Regions can be eden and survivor, and optionally after (concurrent) marking some old generation regions
All live data in a CSet is evacuated (copied/moved) during the GC
Footprint overhead less than 1%
-XX:InitiatingHeapOccupancyPercentage=n
Trigger to start GC
Percent of entire heap not just old generation
Automatic resizing of young generation has lower and upper bound of 20% and 80% of java heap, respectively
Caution
Too Low => Unnecessary GC overhead
Too High => “Space Overflow” => Full GC
-XX:G1OldCSetRegionLiveThresholdPercent=n
Threshold for region to be included in a Collection Set
Caution
Too high => More aggressive collecting => More live objects to copy
Too low => Wasting some heap
-XX:G1MixedGCCountTarget=n
How many Mixed GC / Concurrent Cycle
Caution
Too high => Unnecessary overhead
Too low => Longer pauses
Promotion Failure when JVM runs out of heap regions during the GC
For either survivors and promoted objects
Heap is already at maximum
Indicated by “to-space overflow” in PrintGCDetails log
Very expensive operation
GC still has to continue
Unsuccessfully copied objects have to be tenured in place
Any updates to RSets of regions in CSet have to be regenerated
Prevention
Increase heap size
More marking threads