SlideShare ist ein Scribd-Unternehmen logo
1 von 85
Downloaden Sie, um offline zu lesen
Jean-Philippe Bempel
@jpbempel
Understanding
Low Latency
JVM GCs
2 •
• Concurrent Marking
• Shenandoah
• Azul’s C4
• ZGC
• How to choose a GC algorithm?
Understanding JVM GC: Advanced!
Concurrent Marking
4 •
• Used in CMS & G1 algorithms already and by all low latency GCs
• Try to mark the whole object graph concurrently with the application running
• Based on Tri-color abstraction & Snapshot-At-The-Beginning algorithm
• Used by CMS, G1, Shenandoah
Concurrent Marking
5 •
Concurrent Marking: Tri-Color Abstraction
6 •
Concurrent Marking: Tri-Color Abstraction
7 •
Concurrent Marking: Tri-Color Abstraction
8 •
Concurrent Marking: Tri-Color Abstraction
9 •
Concurrent Marking: Tri-Color Abstraction
10 •
Concurrent Marking: Tri-Color Abstraction
11 •
Concurrent Marking: Tri-Color Abstraction
12 •
Concurrent Marking: Issues
• New allocations during marking phase can be handled by:
• Marking automatically object at allocation
• Not considering new allocations for the current cycle
• Tri-Color abstraction provides 2 properties of missed object:
1. The mutator stores a reference to a white object into a black object.
2. All paths from any gray objects to that white object are destroyed.
http://www.memorymanagement.org/glossary/s.html#term-snapshot-at-the-beginning
13 •
Concurrent Marking: Issues
A
B
C
A.field1 = C;
B.field2 = null;
14 •
Concurrent Marking: Issues
A
B
C
A.field1 = C;
B.field2 = null;
15 •
Concurrent Marking: Issues
A
B
C
A.field1 = C;
B.field2 = null;
16 •
Concurrent Marking: Issues
A
B
C
A.field1 = C;
B.field2 = null;
OOPS!
17 •
• 2 ways to ensure not missing any marking
• Snapshot-At-The-Beginning
• Incremental Update
• For SATB, Pre-Write Barriers, recording object for marking
• Before a reference assignation (X.f = Y)
• SATB barrier is only active when Marking is on (global state)
Concurrent Marking: Resolving misses
if (SATB_WriteBarrier) {
if (X.f != null)
SATB_enqueue(X.f);
}
cmp BYTE PTR [r15+0x30],0x0
jne 0x000002965edc62e5
[...]
mov r11d,DWORD PTR [rbp+0x74]
test r11d,r11d
je 0x000002965edc6253
mov r10,QWORD PTR [r15+0x38]
mov rcx,r11
shl rcx,0x3
test r10,r10
je 0x000002965edc6318
mov r11,QWORD PTR [r15+0x48]
mov QWORD PTR [r11+r10*1-0x8],rcx
add r10,0xfffffffffffffff8
mov QWORD PTR [r15+0x38],r10
jmp 0x000002965edc6253
mov rdx,r15
movabs r10,0x7ffac2febc50
call r10
jmp 0x000002965edc6253
Shenandoah
19 •
• Non-generational (still option for partial collection)
• Region based
• Use Read Barrier: Brooks pointer
• Self-Healing
• Cooperation between mutator threads & GC threads
• Only for concurrent compaction
• Mostly based on G1 but with concurrent compaction
Shenandoah GC
20 •
• Initial Marking (STW)
• Concurrent Marking
• Final Remark (STW)
• Concurrent Cleanup
• Concurrent Evacuation
• Init Update References (STW)
• Concurrent Update References
• Final Update References (STW)
• Concurrent Cleanup
Shenandoah Phases
21 •
• SATB-style (like G1)
• 2 STW pauses for Initial Mark & Final Remark
• Conditional Write Barrier
• To deal with concurrent modification of object graph
Concurrent Marking
22 •
• Same principle than G1:
• Build CollectionSet with Garbage First!
• Evacuate to new regions to release the region for reuse
• Concurrent Evacuation done with the help of:
• 1 Read Barrier : Brooks pointer
• 4 Write Barriers
• Barriers help to keep the to-space invariant:
• All Writes are made into an object in to-space
Concurrent Evacuation
23 •
• All objects have an additional forwarding pointer
• Placed before the regular object
• Dereference the forwarding pointer for each access
• Memory footprint overhead
• Throughput overhead
Brooks pointers
Header
Brooks pointer
mov r13,QWORD PTR [r12+r14*8-0x8]
24 •
• Any writes (even primitives) to from-space object needs to be protected
• Exotic barriers:
• acmp (pointer comparison)
• CAS
• clone
Write Barriers
if (evacInProgress
&& inCollectionSet(obj)
&& notCopyYet(obj)) {
evacuateObject(obj)
}
test BYTE PTR [r15+0x3c0],0x2
jne 0x000000000281bcbc
[...]
mov r10d,DWORD PTR [r13+0xc]
test r10d,r10d
je 0x000000000281bc2b
mov r11,QWORD PTR [r15+0x360]
mov rcx,r10
shl rcx,0x3
test r11,r11
je 0x000000000281bd0d
[...]
mov rdx,r15
movabs r10,0x62d1f660
call r10
jmp 0x000000000281bc2b
25 •
Concurrent Copy: GC thread
Header
Brooks pointer
From-Space To-Space
26 •
Concurrent Copy: GC thread
Header
Brooks pointer
From-Space To-Space
GC thread
27 •
Concurrent Copy: GC thread
Header
Brooks pointer
Header
Brooks pointer
From-Space To-Space
GC thread
28 •
Concurrent Copy: GC thread
Header
Brooks pointer
Header
Brooks pointer
From-Space To-Space
GC thread
29 •
Concurrent Copy: Reader threads
Header
Brooks pointer
From-Space To-Space
Reader
thread
Reader
thread
30 •
Concurrent Copy: Writer threads
Header
Brooks pointer
From-Space To-Space
31 •
Concurrent Copy: Writer threads
Header
Brooks pointer
From-Space To-Space
Writer
thread
Writer
thread
32 •
Concurrent Copy: Writer threads
Header
Brooks pointer
Header
Brooks pointer
From-Space To-Space
Writer
thread
Writer
thread
33 •
Concurrent Copy: Writer threads
Header
Brooks pointer
Header
Brooks pointer
From-Space To-Space
Writer
thread
Writer
thread
Header
Brooks pointer
34 •
Concurrent Copy: Writer threads
Header
Brooks pointer
Header
Brooks pointer
From-Space To-Space
Writer
thread
Writer
thread
Header
Brooks pointer
35 •
Concurrent Copy: Writer threads
Header
Brooks pointer
Header
Brooks pointer
From-Space To-Space
Writer
thread
Writer
thread
Header
Brooks pointer
36 •
Concurrent Copy: Writer threads
Header
Brooks pointer
Header
Brooks pointer
From-Space To-Space
Writer
thread
Writer
thread
37 •
• Late memory release
• Only happens when all refs updated (Concurrent Cleanup phase)
• Allocations can overrun the GC
• Failure modes:
• Pacing
• Degenerated GC
• FullGC
Extreme cases
38 •
• Since JDK13, notable changes introduced
• Load Reference Barrier
• Baker-style barrier (Relocation)
• Evaluated at reference load time
• Eliminating forward pointer word
• Store forward information into Mark word
• Remove Brooks pointer
Shenandoah 2.0?
Azul’s C4
40 •
• Generational (young & old)
• Region based (pages)
• Use Read Barrier: Loaded Value Barrier
• Self-Healing
• Cooperation between mutator threads & GC threads
• Pauseless algorithm but implementation requires safepoints
• Pauses are most of the time < 1ms
Continuously Concurrent Compacting Collector
41 •
• Baker-style based Barrier
• move objects through forwarding addresses stored aside
• Applied at load time, not when dereferencing
• Fused marking & relocation state
• Ensure C4 invariants:
• Marked Through the current cycle
• Not relocated
• If not => Self-healing process to correct it
• Mark object
• Relocate & correct reference
• Checked for each reference loads
• Benefits from JIT optimization for caching loaded value (registers)
LVB
42 •
• States of objects stored inside reference address => Colored pointers
• NMT bit
• Remapped/Generation
• Checked against a global expected value during the GC cycle
• Thread local, almost always L1 cache hits
• Register
• Unmasking reference addresses:
• Linux Kernel module, aliasing
• memfd, multi-mapping
LVB
test r9, rax
jne 0x3001443b
mov r10d, dword ptr [rax + 8]
43 •
Virtual Memory vs Physical Memory
Virtual Memory
Physical Memory
0 2^64
0 2^37
44 •
• All phases are fully parallel & concurrent
• No "rush" to finish phases
• No constraint about STW pause to be short
• Physical memory released quickly in relocation phase
• Can be reused for new allocations
• Plenty of virtual space vs physical memory
C4 Phases
45 •
• Mark
• Marking all objects in graph
• Relocation
• Moving objects to release pages
• Remap
• Fixup references in object graph
• Folded with next mark cycle
C4 Phases
46 •
• Precise Wavefront Marking
• Single pass
• No final mark/remark
• Self-Healing: Mark object that are not marked for the current cycle
Mark Phase
47 •
Mark Phase: Concurrent Modification
A
B
C
A.field1 = C;
B.field2 = null;
48 •
Mark Phase: Concurrent Modification
A
B
C
A.field1 = C;
B.field2 = null;
49 •
Mark Phase: Concurrent Modification
A
B
C
A.field1 = C;
B.field2 = null;
LVB
50 •
Mark Phase: Concurrent Modification
A
B
C
A.field1 = C;
B.field2 = null;
LVB
51 •
• Scanning roots (Static var, Thread stacks, register, JNI handles)
• GC threads scans stalled threads
• Running threads scans their own stack stopping individually at Safepoint
• Scanning object graph like a parallel collector
• Newly allocated objects into new pages, not considered for reclaim (relocation)
• For each page, summing live data bytes, used to select page to reclaim
Mark Phase
52 •
• Select pages with the greatest number of dead objects (garbage first!)
• Protect page selected from being accessed by mutators thread
• Move objects to new allocated pages
• Build side arrays (off heap tables) for forwarding information
• Self-Healing: As protected, LVB will trigger a trap to:
• Copy object to the new location if not done
• Use forward pointer to fix the reference
Relocation Phase
53 •
Virtual
Physical
Relocation Phase
54 •
Virtual
Physical
Relocation Phase
55 •
Virtual
Physical
Relocation Phase
56 •
Virtual
Physical
Relocation Phase
57 •
Virtual
Physical
Relocation Phase
58 •
Virtual
Physical
Relocation Phase
59 •
Virtual
Physical
Relocation Phase
Forwarding table
60 •
Virtual
Physical
Relocation Phase
Forwarding table
61 •
Virtual
Physical
Relocation Phase
Forwarding table
62 •
Virtual
Physical
Relocation Phase
Forwarding table
63 •
• Few chances mutators stall on accessing a ref as processing mostly dead pages
• Once object copy done, physical memory is released (Quick Release)
• Can be immediately reused (remapped) to satisfy new allocations
• Pages evacuated are still mapped & protected to help remap phase
• Cannot be released until all objects are remapped
• Not a problem as we have a huge virtual address space
Relocation Phase
64 •
• Traverse Object Graph and fixup references
• Execute LVB barrier for each object
• Self-Healing: fixup references using forward information
• As we traverse again, mark for the next phase
• Mark & Remap phases are folded!
Remap Phase
65 •
• Algorithm requires a sustainable rate or remapping operations
• Linux limitations:
• TLB invalidation
• Only 4KB pages can be remapped
• Single threaded remapping (write lock)
• Kernel module implements API for the Zing JVM to increase significantly the remapping rate
• Implements also virtual address aliasing for addressing objects with metadata
Remap – Kernel module
66 •
• Young & Old collections done by same algorithm and can be concurrent
• Size of the generation are dynamically adjusted
• Card Marking with write barrier (Stored Value Barrier)
• Old collection is based on young-to-old roots generated by previous young cycle
• Young collection will perform card scanning per page
• hold an eventual concurrent Old collection per page scanned
Generational
67 •
• Used by Hadoop Name Node
• 580GB Heap
• Very hard to tune with G1
• No issue so far regarding GC since production roll out (Oct 2017)
C4 @ Criteo
Z GC
69 •
• Non generational
• Region based (zPages, dynamically sized)
• Concurrent Marking, Compaction, Ref processing
• Use Colored Pointers & Read/Load Barrier
• Self-Healing
• Cooperation between mutator threads & GC threads
• Experimental in JDK 11 (-XX:+UnlockExperimentalVMOptions –XX:+UseZGC)
Z GC
mov r10,QWORD PTR [r11+0xb0]
test QWORD PTR [r15+0x20],r10
jne 0x00007f9594cc54b5
70 •
Z GC
71 •
• Initial Mark (STW)
• Concurrent Mark/Remap
• Final Mark (STW)
• Concurrent Prepare for Relocation
• Start Relocate (STW)
• Concurrent Relocate
Z GC phases:
72 •
• Store metadata in unused bits of reference address
• 42 bits for addressing (4TB)
• 44 bits (16TB) for JDK 13
• 4 bits for metadata
• Marked0
• Marked1
• Remapped
• Finalizable
Colored Pointers
73 •
• Colored pointers needs to be unmasked for dereferencing
• Some HW support masking (SPARC, Aarch64))
• On linux/windows, overhead if done with classical instructions
• Only one view is active at any point
• Plenty of Virtual Space
Multi-Mapping
74 •
Multi-Mapping
Virtual Memory
Physical Memory
0 2^64
0 2^37
(marked0)
001<address>
(marked1)
010<address>
(remapped)
100<address>
75 •
Memory Usage
76 •
• Pages are multiple of 2MB
• 3 different groups
• Small: 2MB pages with object size <= 256KB
• Medium: 32MB pages with object size <= 4MB
• Large: 2MB pages, objects span over multiple of them
• Objects in Large group are meant to not to be relocated (too expensive)
Page Allocations
77 •
• Unmasking ref addresses
• C4: Kernel module aliasing
• Z: Multi-mapping or HW support
• Pages & Relocation
• C4:
• Page are fixed (2MB)
• relocation for large objects by remapping
• Z:
• zPages are dynamic, a zPage can be 100MB large
• No relocation for large objects
Difference between C4 & Z GC
How to choose a GC algorithm
79 •
• You have to run on Windows
• Shenandoah
• Battlefield tested GC (maturity)
• C4
• Shenandoah
• Minimizing any kind of JVM pauses
• C4
• Z
• You don’t want pay for it:
• Shenandoah
• Z
Low latency GCs
References
81 •
• Java Garbage Collection distilled by Martin Thompson
• The Java GC mini book
• Oracle’s white paper on JVM memory management & GC
• What differences JVM makes by Nitsan Wakart
• Memory Management Reference
• IBM Pause-Less GC
References GC Basics
82 •
• Shenandoah: An open-source concurrent compacting garbage collector for OpenJDK
• Shenandoah: The Garbage Collector That Could by Aleksey Shipilev
• Shenandoah GC Wiki
• Load Reference Barriers by Roman Kennke
• Eliminating forward pointer word by Roman Kennke
References Shenandoah
83 •
• The Pauseless GC algorithm (2005)
• C4: Continuously Concurrent Compacting Collector (2011)
• Azul GC in Detail by Charles Humble
• 2010 version source code
References C4
84 •
• ZGC - Low Latency GC for OpenJDK by Per Liden
• Java's new Z Garbage Collector (ZGC) is very exciting by Richard Warburton
• A first look into ZGC by Dominik Inführ
• Architectural Comparison with C4/Pauseless
• ZGC Heap Size and RSS counters
References ZGC
Thank You!
@jpbempel

Weitere ähnliche Inhalte

Was ist angesagt?

Garbage collection in JVM
Garbage collection in JVMGarbage collection in JVM
Garbage collection in JVMaragozin
 
JCConf 2020 - New Java Features Released in 2020
JCConf 2020 - New Java Features Released in 2020JCConf 2020 - New Java Features Released in 2020
JCConf 2020 - New Java Features Released in 2020Joseph Kuo
 
Quick introduction to Java Garbage Collector (JVM GC)
Quick introduction to Java Garbage Collector (JVM GC)Quick introduction to Java Garbage Collector (JVM GC)
Quick introduction to Java Garbage Collector (JVM GC)Marcos García
 
Rocks db state store in structured streaming
Rocks db state store in structured streamingRocks db state store in structured streaming
Rocks db state store in structured streamingBalaji Mohanam
 
Jumping into heaven’s gate
Jumping into heaven’s gateJumping into heaven’s gate
Jumping into heaven’s gateYarden Shafir
 
JVM Memory Model - Yoav Abrahami, Wix
JVM Memory Model - Yoav Abrahami, WixJVM Memory Model - Yoav Abrahami, Wix
JVM Memory Model - Yoav Abrahami, WixCodemotion Tel Aviv
 
The Search for Gravitational Waves
The Search for Gravitational WavesThe Search for Gravitational Waves
The Search for Gravitational Wavesinside-BigData.com
 
Slide smallfiles
Slide smallfilesSlide smallfiles
Slide smallfilesrledisez
 
Introduction to Garbage Collection
Introduction to Garbage CollectionIntroduction to Garbage Collection
Introduction to Garbage CollectionArtur Mkrtchyan
 
Beirut Java User Group JVM presentation
Beirut Java User Group JVM presentationBeirut Java User Group JVM presentation
Beirut Java User Group JVM presentationMahmoud Anouti
 
Java Garbage Collection, Monitoring, and Tuning
Java Garbage Collection, Monitoring, and TuningJava Garbage Collection, Monitoring, and Tuning
Java Garbage Collection, Monitoring, and TuningCarol McDonald
 
Paper_Scalable database logging for multicores
Paper_Scalable database logging for multicoresPaper_Scalable database logging for multicores
Paper_Scalable database logging for multicoresHyo jeong Lee
 
Basic Garbage Collection Techniques
Basic  Garbage  Collection  TechniquesBasic  Garbage  Collection  Techniques
Basic Garbage Collection TechniquesAn Khuong
 
Time Series Processing with Solr and Spark
Time Series Processing with Solr and SparkTime Series Processing with Solr and Spark
Time Series Processing with Solr and SparkJosef Adersberger
 
Petri Nets: Properties, Analysis and Applications
Petri Nets: Properties, Analysis and ApplicationsPetri Nets: Properties, Analysis and Applications
Petri Nets: Properties, Analysis and ApplicationsDr. Mohamed Torky
 

Was ist angesagt? (20)

Garbage collection in JVM
Garbage collection in JVMGarbage collection in JVM
Garbage collection in JVM
 
JCConf 2020 - New Java Features Released in 2020
JCConf 2020 - New Java Features Released in 2020JCConf 2020 - New Java Features Released in 2020
JCConf 2020 - New Java Features Released in 2020
 
Quick introduction to Java Garbage Collector (JVM GC)
Quick introduction to Java Garbage Collector (JVM GC)Quick introduction to Java Garbage Collector (JVM GC)
Quick introduction to Java Garbage Collector (JVM GC)
 
Rocks db state store in structured streaming
Rocks db state store in structured streamingRocks db state store in structured streaming
Rocks db state store in structured streaming
 
Jumping into heaven’s gate
Jumping into heaven’s gateJumping into heaven’s gate
Jumping into heaven’s gate
 
JVM Memory Model - Yoav Abrahami, Wix
JVM Memory Model - Yoav Abrahami, WixJVM Memory Model - Yoav Abrahami, Wix
JVM Memory Model - Yoav Abrahami, Wix
 
Deep hooks
Deep hooksDeep hooks
Deep hooks
 
The Search for Gravitational Waves
The Search for Gravitational WavesThe Search for Gravitational Waves
The Search for Gravitational Waves
 
Exploring Petri Net State Spaces
Exploring Petri Net State SpacesExploring Petri Net State Spaces
Exploring Petri Net State Spaces
 
Slide smallfiles
Slide smallfilesSlide smallfiles
Slide smallfiles
 
Introduction to Garbage Collection
Introduction to Garbage CollectionIntroduction to Garbage Collection
Introduction to Garbage Collection
 
Beirut Java User Group JVM presentation
Beirut Java User Group JVM presentationBeirut Java User Group JVM presentation
Beirut Java User Group JVM presentation
 
Java GC
Java GCJava GC
Java GC
 
A petri-net
A petri-netA petri-net
A petri-net
 
Gc algorithms
Gc algorithmsGc algorithms
Gc algorithms
 
Java Garbage Collection, Monitoring, and Tuning
Java Garbage Collection, Monitoring, and TuningJava Garbage Collection, Monitoring, and Tuning
Java Garbage Collection, Monitoring, and Tuning
 
Paper_Scalable database logging for multicores
Paper_Scalable database logging for multicoresPaper_Scalable database logging for multicores
Paper_Scalable database logging for multicores
 
Basic Garbage Collection Techniques
Basic  Garbage  Collection  TechniquesBasic  Garbage  Collection  Techniques
Basic Garbage Collection Techniques
 
Time Series Processing with Solr and Spark
Time Series Processing with Solr and SparkTime Series Processing with Solr and Spark
Time Series Processing with Solr and Spark
 
Petri Nets: Properties, Analysis and Applications
Petri Nets: Properties, Analysis and ApplicationsPetri Nets: Properties, Analysis and Applications
Petri Nets: Properties, Analysis and Applications
 

Ähnlich wie Understanding low latency jvm gcs V2

Cache aware hybrid sorter
Cache aware hybrid sorterCache aware hybrid sorter
Cache aware hybrid sorterManchor Ko
 
Triton and Symbolic execution on GDB@DEF CON China
Triton and Symbolic execution on GDB@DEF CON ChinaTriton and Symbolic execution on GDB@DEF CON China
Triton and Symbolic execution on GDB@DEF CON ChinaWei-Bo Chen
 
JVM @ Taobao - QCon Hangzhou 2011
JVM @ Taobao - QCon Hangzhou 2011JVM @ Taobao - QCon Hangzhou 2011
JVM @ Taobao - QCon Hangzhou 2011Kris Mok
 
OpenJDK Concurrent Collectors
OpenJDK Concurrent CollectorsOpenJDK Concurrent Collectors
OpenJDK Concurrent CollectorsMonica Beckwith
 
Dynamic Binary Analysis and Obfuscated Codes
Dynamic Binary Analysis and Obfuscated Codes Dynamic Binary Analysis and Obfuscated Codes
Dynamic Binary Analysis and Obfuscated Codes Jonathan Salwan
 
Implementing a JavaScript Engine
Implementing a JavaScript EngineImplementing a JavaScript Engine
Implementing a JavaScript EngineKris Mok
 
Eclipse Day India 2015 - Java bytecode analysis and JIT
Eclipse Day India 2015 - Java bytecode analysis and JITEclipse Day India 2015 - Java bytecode analysis and JIT
Eclipse Day India 2015 - Java bytecode analysis and JITEclipse Day India
 
Processing Large Graphs
Processing Large GraphsProcessing Large Graphs
Processing Large GraphsNishant Gandhi
 
Digging for Android Kernel Bugs
Digging for Android Kernel BugsDigging for Android Kernel Bugs
Digging for Android Kernel BugsJiahong Fang
 
Stream processing from single node to a cluster
Stream processing from single node to a clusterStream processing from single node to a cluster
Stream processing from single node to a clusterGal Marder
 
Single instruction multiple data
Single instruction multiple dataSingle instruction multiple data
Single instruction multiple dataSyed Zaid Irshad
 
Triton and symbolic execution on gdb
Triton and symbolic execution on gdbTriton and symbolic execution on gdb
Triton and symbolic execution on gdbWei-Bo Chen
 
Scale Relational Database with NewSQL
Scale Relational Database with NewSQLScale Relational Database with NewSQL
Scale Relational Database with NewSQLPingCAP
 
G1 collector and tuning and Cassandra
G1 collector and tuning and CassandraG1 collector and tuning and Cassandra
G1 collector and tuning and CassandraChris Lohfink
 
Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...SignalFx
 
Bh ad-12-stealing-from-thieves-saher-slides
Bh ad-12-stealing-from-thieves-saher-slidesBh ad-12-stealing-from-thieves-saher-slides
Bh ad-12-stealing-from-thieves-saher-slidesMatt Kocubinski
 
Code lifecycle in the jvm - TopConf Linz
Code lifecycle in the jvm - TopConf LinzCode lifecycle in the jvm - TopConf Linz
Code lifecycle in the jvm - TopConf LinzIvan Krylov
 

Ähnlich wie Understanding low latency jvm gcs V2 (20)

Cache aware hybrid sorter
Cache aware hybrid sorterCache aware hybrid sorter
Cache aware hybrid sorter
 
Triton and Symbolic execution on GDB@DEF CON China
Triton and Symbolic execution on GDB@DEF CON ChinaTriton and Symbolic execution on GDB@DEF CON China
Triton and Symbolic execution on GDB@DEF CON China
 
JVM @ Taobao - QCon Hangzhou 2011
JVM @ Taobao - QCon Hangzhou 2011JVM @ Taobao - QCon Hangzhou 2011
JVM @ Taobao - QCon Hangzhou 2011
 
OpenJDK Concurrent Collectors
OpenJDK Concurrent CollectorsOpenJDK Concurrent Collectors
OpenJDK Concurrent Collectors
 
Dynamic Binary Analysis and Obfuscated Codes
Dynamic Binary Analysis and Obfuscated Codes Dynamic Binary Analysis and Obfuscated Codes
Dynamic Binary Analysis and Obfuscated Codes
 
Vectorization in ATLAS
Vectorization in ATLASVectorization in ATLAS
Vectorization in ATLAS
 
Implementing a JavaScript Engine
Implementing a JavaScript EngineImplementing a JavaScript Engine
Implementing a JavaScript Engine
 
Eclipse Day India 2015 - Java bytecode analysis and JIT
Eclipse Day India 2015 - Java bytecode analysis and JITEclipse Day India 2015 - Java bytecode analysis and JIT
Eclipse Day India 2015 - Java bytecode analysis and JIT
 
New Algorithms in Java
New Algorithms in JavaNew Algorithms in Java
New Algorithms in Java
 
Processing Large Graphs
Processing Large GraphsProcessing Large Graphs
Processing Large Graphs
 
Digging for Android Kernel Bugs
Digging for Android Kernel BugsDigging for Android Kernel Bugs
Digging for Android Kernel Bugs
 
Stream processing from single node to a cluster
Stream processing from single node to a clusterStream processing from single node to a cluster
Stream processing from single node to a cluster
 
Single instruction multiple data
Single instruction multiple dataSingle instruction multiple data
Single instruction multiple data
 
Triton and symbolic execution on gdb
Triton and symbolic execution on gdbTriton and symbolic execution on gdb
Triton and symbolic execution on gdb
 
Scale Relational Database with NewSQL
Scale Relational Database with NewSQLScale Relational Database with NewSQL
Scale Relational Database with NewSQL
 
G1 collector and tuning and Cassandra
G1 collector and tuning and CassandraG1 collector and tuning and Cassandra
G1 collector and tuning and Cassandra
 
Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...
 
Optimizing Java Notes
Optimizing Java NotesOptimizing Java Notes
Optimizing Java Notes
 
Bh ad-12-stealing-from-thieves-saher-slides
Bh ad-12-stealing-from-thieves-saher-slidesBh ad-12-stealing-from-thieves-saher-slides
Bh ad-12-stealing-from-thieves-saher-slides
 
Code lifecycle in the jvm - TopConf Linz
Code lifecycle in the jvm - TopConf LinzCode lifecycle in the jvm - TopConf Linz
Code lifecycle in the jvm - TopConf Linz
 

Mehr von Jean-Philippe BEMPEL

Javaday 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneur...
Javaday 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneur...Javaday 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneur...
Javaday 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneur...Jean-Philippe BEMPEL
 
Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...
Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...
Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...Jean-Philippe BEMPEL
 
Tools in action jdk mission control and flight recorder
Tools in action  jdk mission control and flight recorderTools in action  jdk mission control and flight recorder
Tools in action jdk mission control and flight recorderJean-Philippe BEMPEL
 
Out ofmemoryerror what is the cost of java objects
Out ofmemoryerror  what is the cost of java objectsOut ofmemoryerror  what is the cost of java objects
Out ofmemoryerror what is the cost of java objectsJean-Philippe BEMPEL
 
OutOfMemoryError : quel est le coût des objets en java
OutOfMemoryError : quel est le coût des objets en javaOutOfMemoryError : quel est le coût des objets en java
OutOfMemoryError : quel est le coût des objets en javaJean-Philippe BEMPEL
 
Low latency & mechanical sympathy issues and solutions
Low latency & mechanical sympathy  issues and solutionsLow latency & mechanical sympathy  issues and solutions
Low latency & mechanical sympathy issues and solutionsJean-Philippe BEMPEL
 
Lock free programming - pro tips devoxx uk
Lock free programming - pro tips devoxx ukLock free programming - pro tips devoxx uk
Lock free programming - pro tips devoxx ukJean-Philippe BEMPEL
 
Programmation lock free - les techniques des pros (2eme partie)
Programmation lock free - les techniques des pros (2eme partie)Programmation lock free - les techniques des pros (2eme partie)
Programmation lock free - les techniques des pros (2eme partie)Jean-Philippe BEMPEL
 
Programmation lock free - les techniques des pros (1ere partie)
Programmation lock free - les techniques des pros (1ere partie)Programmation lock free - les techniques des pros (1ere partie)
Programmation lock free - les techniques des pros (1ere partie)Jean-Philippe BEMPEL
 
Measuring directly from cpu hardware performance counters
Measuring directly from cpu  hardware performance countersMeasuring directly from cpu  hardware performance counters
Measuring directly from cpu hardware performance countersJean-Philippe BEMPEL
 
Devoxx france 2014 compteurs de perf
Devoxx france 2014 compteurs de perfDevoxx france 2014 compteurs de perf
Devoxx france 2014 compteurs de perfJean-Philippe BEMPEL
 

Mehr von Jean-Philippe BEMPEL (14)

Mastering GC.pdf
Mastering GC.pdfMastering GC.pdf
Mastering GC.pdf
 
Javaday 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneur...
Javaday 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneur...Javaday 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneur...
Javaday 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneur...
 
Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...
Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...
Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...
 
Tools in action jdk mission control and flight recorder
Tools in action  jdk mission control and flight recorderTools in action  jdk mission control and flight recorder
Tools in action jdk mission control and flight recorder
 
Le guide de dépannage de la jvm
Le guide de dépannage de la jvmLe guide de dépannage de la jvm
Le guide de dépannage de la jvm
 
Out ofmemoryerror what is the cost of java objects
Out ofmemoryerror  what is the cost of java objectsOut ofmemoryerror  what is the cost of java objects
Out ofmemoryerror what is the cost of java objects
 
OutOfMemoryError : quel est le coût des objets en java
OutOfMemoryError : quel est le coût des objets en javaOutOfMemoryError : quel est le coût des objets en java
OutOfMemoryError : quel est le coût des objets en java
 
Low latency & mechanical sympathy issues and solutions
Low latency & mechanical sympathy  issues and solutionsLow latency & mechanical sympathy  issues and solutions
Low latency & mechanical sympathy issues and solutions
 
Lock free programming - pro tips devoxx uk
Lock free programming - pro tips devoxx ukLock free programming - pro tips devoxx uk
Lock free programming - pro tips devoxx uk
 
Lock free programming- pro tips
Lock free programming- pro tipsLock free programming- pro tips
Lock free programming- pro tips
 
Programmation lock free - les techniques des pros (2eme partie)
Programmation lock free - les techniques des pros (2eme partie)Programmation lock free - les techniques des pros (2eme partie)
Programmation lock free - les techniques des pros (2eme partie)
 
Programmation lock free - les techniques des pros (1ere partie)
Programmation lock free - les techniques des pros (1ere partie)Programmation lock free - les techniques des pros (1ere partie)
Programmation lock free - les techniques des pros (1ere partie)
 
Measuring directly from cpu hardware performance counters
Measuring directly from cpu  hardware performance countersMeasuring directly from cpu  hardware performance counters
Measuring directly from cpu hardware performance counters
 
Devoxx france 2014 compteurs de perf
Devoxx france 2014 compteurs de perfDevoxx france 2014 compteurs de perf
Devoxx france 2014 compteurs de perf
 

Kürzlich hochgeladen

SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predieusebiomeyer
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa494f574xmv
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书rnrncn29
 
IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119APNIC
 
ETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptxETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptxNIMMANAGANTI RAMAKRISHNA
 
Unidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxUnidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxmibuzondetrabajo
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxDyna Gilbert
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书rnrncn29
 
Company Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptxCompany Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptxMario
 
TRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptxTRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptxAndrieCagasanAkio
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书zdzoqco
 

Kürzlich hochgeladen (11)

SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predi
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
 
IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119
 
ETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptxETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptx
 
Unidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxUnidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptx
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptx
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
 
Company Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptxCompany Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptx
 
TRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptxTRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptx
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
 

Understanding low latency jvm gcs V2

  • 2. 2 • • Concurrent Marking • Shenandoah • Azul’s C4 • ZGC • How to choose a GC algorithm? Understanding JVM GC: Advanced!
  • 4. 4 • • Used in CMS & G1 algorithms already and by all low latency GCs • Try to mark the whole object graph concurrently with the application running • Based on Tri-color abstraction & Snapshot-At-The-Beginning algorithm • Used by CMS, G1, Shenandoah Concurrent Marking
  • 5. 5 • Concurrent Marking: Tri-Color Abstraction
  • 6. 6 • Concurrent Marking: Tri-Color Abstraction
  • 7. 7 • Concurrent Marking: Tri-Color Abstraction
  • 8. 8 • Concurrent Marking: Tri-Color Abstraction
  • 9. 9 • Concurrent Marking: Tri-Color Abstraction
  • 10. 10 • Concurrent Marking: Tri-Color Abstraction
  • 11. 11 • Concurrent Marking: Tri-Color Abstraction
  • 12. 12 • Concurrent Marking: Issues • New allocations during marking phase can be handled by: • Marking automatically object at allocation • Not considering new allocations for the current cycle • Tri-Color abstraction provides 2 properties of missed object: 1. The mutator stores a reference to a white object into a black object. 2. All paths from any gray objects to that white object are destroyed. http://www.memorymanagement.org/glossary/s.html#term-snapshot-at-the-beginning
  • 13. 13 • Concurrent Marking: Issues A B C A.field1 = C; B.field2 = null;
  • 14. 14 • Concurrent Marking: Issues A B C A.field1 = C; B.field2 = null;
  • 15. 15 • Concurrent Marking: Issues A B C A.field1 = C; B.field2 = null;
  • 16. 16 • Concurrent Marking: Issues A B C A.field1 = C; B.field2 = null; OOPS!
  • 17. 17 • • 2 ways to ensure not missing any marking • Snapshot-At-The-Beginning • Incremental Update • For SATB, Pre-Write Barriers, recording object for marking • Before a reference assignation (X.f = Y) • SATB barrier is only active when Marking is on (global state) Concurrent Marking: Resolving misses if (SATB_WriteBarrier) { if (X.f != null) SATB_enqueue(X.f); } cmp BYTE PTR [r15+0x30],0x0 jne 0x000002965edc62e5 [...] mov r11d,DWORD PTR [rbp+0x74] test r11d,r11d je 0x000002965edc6253 mov r10,QWORD PTR [r15+0x38] mov rcx,r11 shl rcx,0x3 test r10,r10 je 0x000002965edc6318 mov r11,QWORD PTR [r15+0x48] mov QWORD PTR [r11+r10*1-0x8],rcx add r10,0xfffffffffffffff8 mov QWORD PTR [r15+0x38],r10 jmp 0x000002965edc6253 mov rdx,r15 movabs r10,0x7ffac2febc50 call r10 jmp 0x000002965edc6253
  • 19. 19 • • Non-generational (still option for partial collection) • Region based • Use Read Barrier: Brooks pointer • Self-Healing • Cooperation between mutator threads & GC threads • Only for concurrent compaction • Mostly based on G1 but with concurrent compaction Shenandoah GC
  • 20. 20 • • Initial Marking (STW) • Concurrent Marking • Final Remark (STW) • Concurrent Cleanup • Concurrent Evacuation • Init Update References (STW) • Concurrent Update References • Final Update References (STW) • Concurrent Cleanup Shenandoah Phases
  • 21. 21 • • SATB-style (like G1) • 2 STW pauses for Initial Mark & Final Remark • Conditional Write Barrier • To deal with concurrent modification of object graph Concurrent Marking
  • 22. 22 • • Same principle than G1: • Build CollectionSet with Garbage First! • Evacuate to new regions to release the region for reuse • Concurrent Evacuation done with the help of: • 1 Read Barrier : Brooks pointer • 4 Write Barriers • Barriers help to keep the to-space invariant: • All Writes are made into an object in to-space Concurrent Evacuation
  • 23. 23 • • All objects have an additional forwarding pointer • Placed before the regular object • Dereference the forwarding pointer for each access • Memory footprint overhead • Throughput overhead Brooks pointers Header Brooks pointer mov r13,QWORD PTR [r12+r14*8-0x8]
  • 24. 24 • • Any writes (even primitives) to from-space object needs to be protected • Exotic barriers: • acmp (pointer comparison) • CAS • clone Write Barriers if (evacInProgress && inCollectionSet(obj) && notCopyYet(obj)) { evacuateObject(obj) } test BYTE PTR [r15+0x3c0],0x2 jne 0x000000000281bcbc [...] mov r10d,DWORD PTR [r13+0xc] test r10d,r10d je 0x000000000281bc2b mov r11,QWORD PTR [r15+0x360] mov rcx,r10 shl rcx,0x3 test r11,r11 je 0x000000000281bd0d [...] mov rdx,r15 movabs r10,0x62d1f660 call r10 jmp 0x000000000281bc2b
  • 25. 25 • Concurrent Copy: GC thread Header Brooks pointer From-Space To-Space
  • 26. 26 • Concurrent Copy: GC thread Header Brooks pointer From-Space To-Space GC thread
  • 27. 27 • Concurrent Copy: GC thread Header Brooks pointer Header Brooks pointer From-Space To-Space GC thread
  • 28. 28 • Concurrent Copy: GC thread Header Brooks pointer Header Brooks pointer From-Space To-Space GC thread
  • 29. 29 • Concurrent Copy: Reader threads Header Brooks pointer From-Space To-Space Reader thread Reader thread
  • 30. 30 • Concurrent Copy: Writer threads Header Brooks pointer From-Space To-Space
  • 31. 31 • Concurrent Copy: Writer threads Header Brooks pointer From-Space To-Space Writer thread Writer thread
  • 32. 32 • Concurrent Copy: Writer threads Header Brooks pointer Header Brooks pointer From-Space To-Space Writer thread Writer thread
  • 33. 33 • Concurrent Copy: Writer threads Header Brooks pointer Header Brooks pointer From-Space To-Space Writer thread Writer thread Header Brooks pointer
  • 34. 34 • Concurrent Copy: Writer threads Header Brooks pointer Header Brooks pointer From-Space To-Space Writer thread Writer thread Header Brooks pointer
  • 35. 35 • Concurrent Copy: Writer threads Header Brooks pointer Header Brooks pointer From-Space To-Space Writer thread Writer thread Header Brooks pointer
  • 36. 36 • Concurrent Copy: Writer threads Header Brooks pointer Header Brooks pointer From-Space To-Space Writer thread Writer thread
  • 37. 37 • • Late memory release • Only happens when all refs updated (Concurrent Cleanup phase) • Allocations can overrun the GC • Failure modes: • Pacing • Degenerated GC • FullGC Extreme cases
  • 38. 38 • • Since JDK13, notable changes introduced • Load Reference Barrier • Baker-style barrier (Relocation) • Evaluated at reference load time • Eliminating forward pointer word • Store forward information into Mark word • Remove Brooks pointer Shenandoah 2.0?
  • 40. 40 • • Generational (young & old) • Region based (pages) • Use Read Barrier: Loaded Value Barrier • Self-Healing • Cooperation between mutator threads & GC threads • Pauseless algorithm but implementation requires safepoints • Pauses are most of the time < 1ms Continuously Concurrent Compacting Collector
  • 41. 41 • • Baker-style based Barrier • move objects through forwarding addresses stored aside • Applied at load time, not when dereferencing • Fused marking & relocation state • Ensure C4 invariants: • Marked Through the current cycle • Not relocated • If not => Self-healing process to correct it • Mark object • Relocate & correct reference • Checked for each reference loads • Benefits from JIT optimization for caching loaded value (registers) LVB
  • 42. 42 • • States of objects stored inside reference address => Colored pointers • NMT bit • Remapped/Generation • Checked against a global expected value during the GC cycle • Thread local, almost always L1 cache hits • Register • Unmasking reference addresses: • Linux Kernel module, aliasing • memfd, multi-mapping LVB test r9, rax jne 0x3001443b mov r10d, dword ptr [rax + 8]
  • 43. 43 • Virtual Memory vs Physical Memory Virtual Memory Physical Memory 0 2^64 0 2^37
  • 44. 44 • • All phases are fully parallel & concurrent • No "rush" to finish phases • No constraint about STW pause to be short • Physical memory released quickly in relocation phase • Can be reused for new allocations • Plenty of virtual space vs physical memory C4 Phases
  • 45. 45 • • Mark • Marking all objects in graph • Relocation • Moving objects to release pages • Remap • Fixup references in object graph • Folded with next mark cycle C4 Phases
  • 46. 46 • • Precise Wavefront Marking • Single pass • No final mark/remark • Self-Healing: Mark object that are not marked for the current cycle Mark Phase
  • 47. 47 • Mark Phase: Concurrent Modification A B C A.field1 = C; B.field2 = null;
  • 48. 48 • Mark Phase: Concurrent Modification A B C A.field1 = C; B.field2 = null;
  • 49. 49 • Mark Phase: Concurrent Modification A B C A.field1 = C; B.field2 = null; LVB
  • 50. 50 • Mark Phase: Concurrent Modification A B C A.field1 = C; B.field2 = null; LVB
  • 51. 51 • • Scanning roots (Static var, Thread stacks, register, JNI handles) • GC threads scans stalled threads • Running threads scans their own stack stopping individually at Safepoint • Scanning object graph like a parallel collector • Newly allocated objects into new pages, not considered for reclaim (relocation) • For each page, summing live data bytes, used to select page to reclaim Mark Phase
  • 52. 52 • • Select pages with the greatest number of dead objects (garbage first!) • Protect page selected from being accessed by mutators thread • Move objects to new allocated pages • Build side arrays (off heap tables) for forwarding information • Self-Healing: As protected, LVB will trigger a trap to: • Copy object to the new location if not done • Use forward pointer to fix the reference Relocation Phase
  • 63. 63 • • Few chances mutators stall on accessing a ref as processing mostly dead pages • Once object copy done, physical memory is released (Quick Release) • Can be immediately reused (remapped) to satisfy new allocations • Pages evacuated are still mapped & protected to help remap phase • Cannot be released until all objects are remapped • Not a problem as we have a huge virtual address space Relocation Phase
  • 64. 64 • • Traverse Object Graph and fixup references • Execute LVB barrier for each object • Self-Healing: fixup references using forward information • As we traverse again, mark for the next phase • Mark & Remap phases are folded! Remap Phase
  • 65. 65 • • Algorithm requires a sustainable rate or remapping operations • Linux limitations: • TLB invalidation • Only 4KB pages can be remapped • Single threaded remapping (write lock) • Kernel module implements API for the Zing JVM to increase significantly the remapping rate • Implements also virtual address aliasing for addressing objects with metadata Remap – Kernel module
  • 66. 66 • • Young & Old collections done by same algorithm and can be concurrent • Size of the generation are dynamically adjusted • Card Marking with write barrier (Stored Value Barrier) • Old collection is based on young-to-old roots generated by previous young cycle • Young collection will perform card scanning per page • hold an eventual concurrent Old collection per page scanned Generational
  • 67. 67 • • Used by Hadoop Name Node • 580GB Heap • Very hard to tune with G1 • No issue so far regarding GC since production roll out (Oct 2017) C4 @ Criteo
  • 68. Z GC
  • 69. 69 • • Non generational • Region based (zPages, dynamically sized) • Concurrent Marking, Compaction, Ref processing • Use Colored Pointers & Read/Load Barrier • Self-Healing • Cooperation between mutator threads & GC threads • Experimental in JDK 11 (-XX:+UnlockExperimentalVMOptions –XX:+UseZGC) Z GC mov r10,QWORD PTR [r11+0xb0] test QWORD PTR [r15+0x20],r10 jne 0x00007f9594cc54b5
  • 71. 71 • • Initial Mark (STW) • Concurrent Mark/Remap • Final Mark (STW) • Concurrent Prepare for Relocation • Start Relocate (STW) • Concurrent Relocate Z GC phases:
  • 72. 72 • • Store metadata in unused bits of reference address • 42 bits for addressing (4TB) • 44 bits (16TB) for JDK 13 • 4 bits for metadata • Marked0 • Marked1 • Remapped • Finalizable Colored Pointers
  • 73. 73 • • Colored pointers needs to be unmasked for dereferencing • Some HW support masking (SPARC, Aarch64)) • On linux/windows, overhead if done with classical instructions • Only one view is active at any point • Plenty of Virtual Space Multi-Mapping
  • 74. 74 • Multi-Mapping Virtual Memory Physical Memory 0 2^64 0 2^37 (marked0) 001<address> (marked1) 010<address> (remapped) 100<address>
  • 76. 76 • • Pages are multiple of 2MB • 3 different groups • Small: 2MB pages with object size <= 256KB • Medium: 32MB pages with object size <= 4MB • Large: 2MB pages, objects span over multiple of them • Objects in Large group are meant to not to be relocated (too expensive) Page Allocations
  • 77. 77 • • Unmasking ref addresses • C4: Kernel module aliasing • Z: Multi-mapping or HW support • Pages & Relocation • C4: • Page are fixed (2MB) • relocation for large objects by remapping • Z: • zPages are dynamic, a zPage can be 100MB large • No relocation for large objects Difference between C4 & Z GC
  • 78. How to choose a GC algorithm
  • 79. 79 • • You have to run on Windows • Shenandoah • Battlefield tested GC (maturity) • C4 • Shenandoah • Minimizing any kind of JVM pauses • C4 • Z • You don’t want pay for it: • Shenandoah • Z Low latency GCs
  • 81. 81 • • Java Garbage Collection distilled by Martin Thompson • The Java GC mini book • Oracle’s white paper on JVM memory management & GC • What differences JVM makes by Nitsan Wakart • Memory Management Reference • IBM Pause-Less GC References GC Basics
  • 82. 82 • • Shenandoah: An open-source concurrent compacting garbage collector for OpenJDK • Shenandoah: The Garbage Collector That Could by Aleksey Shipilev • Shenandoah GC Wiki • Load Reference Barriers by Roman Kennke • Eliminating forward pointer word by Roman Kennke References Shenandoah
  • 83. 83 • • The Pauseless GC algorithm (2005) • C4: Continuously Concurrent Compacting Collector (2011) • Azul GC in Detail by Charles Humble • 2010 version source code References C4
  • 84. 84 • • ZGC - Low Latency GC for OpenJDK by Per Liden • Java's new Z Garbage Collector (ZGC) is very exciting by Richard Warburton • A first look into ZGC by Dominik Inführ • Architectural Comparison with C4/Pauseless • ZGC Heap Size and RSS counters References ZGC