SlideShare ist ein Scribd-Unternehmen logo
1 von 59
© Copyright - TTNR Labs Limited 2018
1
JCTools: Pit Of Performance
Nitsan Wakart/@nitsanw
InfoQ.com: News & Community Site
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
jctools-algorithms-optimization
• Over 1,000,000 software developers, architects and CTOs read the site world-
wide every month
• 250,000 senior developers subscribe to our weekly newsletter
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• 2 dedicated podcast channels: The InfoQ Podcast, with a focus on
Architecture and The Engineering Culture Podcast, with a focus on building
• 96 deep dives on innovative topics packed as downloadable emags and
minibooks
• Over 40 new content items per week
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon London
www.qconlondon.com
© Copyright - TTNR Labs Limited 2018 2
You Look Familiar...
© Copyright - TTNR Labs Limited 2018 3
Why me?
●
Main JCTools contributor
●
Performance Eng.
●
Find me on:
– GitHub : nitsanw
– Blog : psy-lob-saw.blogspot.com
– Twitter : nitsanw
●
Also: Cape Town JUG Organizer
© Copyright - TTNR Labs Limited 2018 4
JCTools?
●
Concurrent Queues:
– SPSC/MPSC/SPMC/MPMC
– Linked/Array/LinkedArray/Compound
●
And MOAR!!!
© Copyright - TTNR Labs Limited 2018 5
What is this about?
●
Optimizations as found in the code
●
Design pressures/choices
●
Novel algorithm: MPSC linked queues
© Copyright - TTNR Labs Limited 2018 6
From the codebase...
●
Layout
●
Unsafe
●
Concurrency specialization
●
Bits and bobs
© Copyright - TTNR Labs Limited 2018 7
Layout
abstract class ...L1Pad<E> extends ...ColdField<E> {
long p00,p01,p02,p03,p04,p05,p06,p07,
p08,p09,p10,p11,p12,p13,p14;
}
abstract class ...ProducerIndexFields<E> extends ...L1Pad<E> {
private volatile long producerIndex;
protected long producerLimit;
}
abstract class ...L2Pad<E> extends ...ProducerIndexFields<E> {
long p00,p01,p02,p03,p04,p05,p06,p07,
p08,p09,p10,p11,p12,p13,p14;
}
© Copyright - TTNR Labs Limited 2018 8
Layout
abstract class ...L1Pad<E> extends ...ColdField<E> {
long p00,p01,p02,p03,p04,p05,p06,p07,
p08,p09,p10,p11,p12,p13,p14;
}
abstract class ...ProducerIndexFields<E> extends ...L1Pad<E> {
private volatile long producerIndex;
protected long producerLimit;
}
abstract class ...L2Pad<E> extends ...ProducerIndexFields<E> {
long p00,p01,p02,p03,p04,p05,p06,p07,
p08,p09,p10,p11,p12,p13,p14;
}
© Copyright - TTNR Labs Limited 2018 9
Measure with/with out:
With padding: ~950 ns/op
Sans padding: ~5000 ns/op
*QueueBurst, 100 messages benchmark
© Copyright - TTNR Labs Limited 2018 10
Java Object Layout
●
8b aligned
●
Object header (8/12/16b)
●
Sub-classes fields after parent fields
●
Field order within class can change
© Copyright - TTNR Labs Limited 2018 11
Let’s use JOL!!!
# Running 64-bit HotSpot VM.
# Using compressed oop with 3-bit shift.
# Using compressed klass with 3-bit shift.
# WARNING | Compressed references base/shifts are guessed by the experiment!
# WARNING | Therefore, computed addresses are just guesses, and ARE NOT RELIABLE.
# WARNING | Make sure to attach Serviceability Agent to get the reliable addresses.
# Objects are 8 bytes aligned.
# Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
# Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
Instantiated the sample instance via public org.jctools.queues.FFBuffer(int)
org.jctools.queues.SpscArrayQueue object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00 (00000001 00000000 00000000 00000000) (1)
4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0)
8 4 (object header) 37 28 01 f8 (00110111 00101000 00000001 11111000) (-134141897)
12 4 (alignment/padding gap)
16 8 long ConcurrentCircularArrayQueueL0Pad.p00 0
24 8 long ConcurrentCircularArrayQueueL0Pad.p01 0
32 8 long ConcurrentCircularArrayQueueL0Pad.p02 0
40 8 long ConcurrentCircularArrayQueueL0Pad.p03 0
48 8 long ConcurrentCircularArrayQueueL0Pad.p04 0
56 8 long ConcurrentCircularArrayQueueL0Pad.p05 0
64 8 long ConcurrentCircularArrayQueueL0Pad.p06 0
72 8 long ConcurrentCircularArrayQueueL0Pad.p07 0
80 8 long ConcurrentCircularArrayQueueL0Pad.p08 0
88 8 long ConcurrentCircularArrayQueueL0Pad.p09 0
96 8 long ConcurrentCircularArrayQueueL0Pad.p10 0
104 8 long ConcurrentCircularArrayQueueL0Pad.p11 0
112 8 long ConcurrentCircularArrayQueueL0Pad.p12 0
120 8 long ConcurrentCircularArrayQueueL0Pad.p13 0
128 8 long ConcurrentCircularArrayQueueL0Pad.p14 0
136 8 long ConcurrentCircularArrayQueue.mask 3
144 4 java.lang.Object[] ConcurrentCircularArrayQueue.buffer [null, null, null, null]
148 4 int SpscArrayQueueColdField.lookAheadStep 1
152 8 long SpscArrayQueueL1Pad.p00 0
160 8 long SpscArrayQueueL1Pad.p01 0
168 8 long SpscArrayQueueL1Pad.p02 0
176 8 long SpscArrayQueueL1Pad.p03 0
184 8 long SpscArrayQueueL1Pad.p04 0
192 8 long SpscArrayQueueL1Pad.p05 0
200 8 long SpscArrayQueueL1Pad.p06 0
208 8 long SpscArrayQueueL1Pad.p07 0
216 8 long SpscArrayQueueL1Pad.p08 0
224 8 long SpscArrayQueueL1Pad.p09 0
232 8 long SpscArrayQueueL1Pad.p10 0
240 8 long SpscArrayQueueL1Pad.p11 0
248 8 long SpscArrayQueueL1Pad.p12 0
256 8 long SpscArrayQueueL1Pad.p13 0
264 8 long SpscArrayQueueL1Pad.p14 0
272 8 long SpscArrayQueueProducerIndexFields.producerIndex 0
280 8 long SpscArrayQueueProducerIndexFields.producerLimit 0
288 8 long SpscArrayQueueL2Pad.p00 0
296 8 long SpscArrayQueueL2Pad.p01 0
304 8 long SpscArrayQueueL2Pad.p02 0
312 8 long SpscArrayQueueL2Pad.p03 0
320 8 long SpscArrayQueueL2Pad.p04 0
328 8 long SpscArrayQueueL2Pad.p05 0
336 8 long SpscArrayQueueL2Pad.p06 0
344 8 long SpscArrayQueueL2Pad.p07 0
352 8 long SpscArrayQueueL2Pad.p08 0
360 8 long SpscArrayQueueL2Pad.p09 0
368 8 long SpscArrayQueueL2Pad.p10 0
376 8 long SpscArrayQueueL2Pad.p11 0
384 8 long SpscArrayQueueL2Pad.p12 0
392 8 long SpscArrayQueueL2Pad.p13 0
400 8 long SpscArrayQueueL2Pad.p14 0
408 8 long SpscArrayQueueConsumerIndexField.consumerIndex 0
416 8 long SpscArrayQueueL3Pad.p00 0
424 8 long SpscArrayQueueL3Pad.p01 0
432 8 long SpscArrayQueueL3Pad.p02 0
440 8 long SpscArrayQueueL3Pad.p03 0
448 8 long SpscArrayQueueL3Pad.p04 0
456 8 long SpscArrayQueueL3Pad.p05 0
464 8 long SpscArrayQueueL3Pad.p06 0
472 8 long SpscArrayQueueL3Pad.p07 0
480 8 long SpscArrayQueueL3Pad.p08 0
488 8 long SpscArrayQueueL3Pad.p09 0
496 8 long SpscArrayQueueL3Pad.p10 0
504 8 long SpscArrayQueueL3Pad.p11 0
512 8 long SpscArrayQueueL3Pad.p12 0
520 8 long SpscArrayQueueL3Pad.p13 0
528 8 long SpscArrayQueueL3Pad.p14 0
Instance size: 536 bytes
© Copyright - TTNR Labs Limited 2018 12
Let’s use JOL!!!
# Running 64-bit HotSpot VM.
# Using compressed oop with 3-bit shift.
# Using compressed klass with 3-bit shift.
# Objects are 8 bytes aligned.
# Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
# Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
© Copyright - TTNR Labs Limited 2018 13
Let’s use JOL!!!
org.jctools.queues.SpscArrayQueue object internals:
OFFSET SIZE TYPE DESCRIPTION
0 12 (object header)
12 4 (alignment/padding gap)
16 120 long ConcurrentCircularArrayQueueL0Pad.p00 [... p14]
136 8 long ConcurrentCircularArrayQueue.mask
144 4 java.lang.Object[] ConcurrentCircularArrayQueue.buffer
148 4 int SpscArrayQueueColdField.lookAheadStep
152 120 long SpscArrayQueueL1Pad.p00 [... p14]
272 8 long SpscArrayQueueProducerIndexFields.producerIndex
280 8 long SpscArrayQueueProducerIndexFields.producerLimit
288 120 long SpscArrayQueueL2Pad.p00 [... p14]
408 8 long SpscArrayQueueConsumerIndexField.consumerIndex
416 120 long SpscArrayQueueL3Pad.p00 [... p14]
Instance size: 536 bytes
© Copyright - TTNR Labs Limited 2018 14
Let’s use JOL!!!
org.jctools.queues.SpscArrayQueue object internals:
OFFSET SIZE TYPE DESCRIPTION
0 12 (object header)
12 4 (alignment/padding gap)
16 120 long ConcurrentCircularArrayQueueL0Pad.p00 [... p14]
136 8 long ConcurrentCircularArrayQueue.mask
144 4 java.lang.Object[] ConcurrentCircularArrayQueue.buffer
148 4 int SpscArrayQueueColdField.lookAheadStep
152 120 long SpscArrayQueueL1Pad.p00 [... p14]
272 8 long SpscArrayQueueProducerIndexFields.producerIndex
280 8 long SpscArrayQueueProducerIndexFields.producerLimit
288 120 long SpscArrayQueueL2Pad.p00 [... p14]
408 8 long SpscArrayQueueConsumerIndexField.consumerIndex
416 120 long SpscArrayQueueL3Pad.p00 [... p14]
Instance size: 536 bytes
© Copyright - TTNR Labs Limited 2018 15
Let’s use JOL!!!
org.jctools.queues.SpscArrayQueue object internals:
OFFSET SIZE TYPE DESCRIPTION
0 12 (object header)
12 4 (alignment/padding gap)
16 120 long ConcurrentQueueL0Pad.p00 [... p14]
136 8 long ConcurrentCircularArrayQueue.mask
144 4 java.lang.Object[] ConcurrentCircularArrayQueue.buffer
148 4 int SpscArrayQueueColdField.lookAheadStep
152 120 long SpscArrayQueueL1Pad.p00 [... p14]
272 8 long SpscArrayQueueProducerIndexFields.producerIndex
280 8 long SpscArrayQueueProducerIndexFields.producerLimit
288 120 long SpscArrayQueueL2Pad.p00 [... p14]
408 8 long SpscArrayQueueConsumerIndexField.consumerIndex
416 120 long SpscArrayQueueL3Pad.p00 [... p14]
Instance size: 536 bytes
© Copyright - TTNR Labs Limited 2018 16
Let’s use JOL!!!
org.jctools.queues.SpscArrayQueue object internals:
OFFSET SIZE TYPE DESCRIPTION
0 12 (object header)
12 4 (alignment/padding gap)
16 120 long ConcurrentCircularArrayQueueL0Pad.p00 [... p14]
136 8 long ConcurrentCircularArrayQueue.mask
144 4 java.lang.Object[] ConcurrentCircularArrayQueue.buffer
148 4 int SpscArrayQueueColdField.lookAheadStep
152 120 long SpscArrayQueueL1Pad.p00 [... p14]
272 8 long SpscArrayQueueProducerIndexFields.producerIndex
280 8 long SpscArrayQueueProducerIndexFields.producerLimit
288 120 long SpscArrayQueueL2Pad.p00 [... p14]
408 8 long SpscArrayQueueConsumerIndexField.consumerIndex
416 120 long SpscArrayQueueL3Pad.p00 [... p14]
Instance size: 536 bytes
© Copyright - TTNR Labs Limited 2018 17
Why bother with layout?
●
False sharing
●
Minimize load misses
●
Offheap?
– Atomicity
– Alignment requirements (line/page/sector)
– Different method (see Aeron)
© Copyright - TTNR Labs Limited 2018 18
Sharing?
header header pIndex cIndex mask buffer
© Copyright - TTNR Labs Limited 2018 19
False Sharing
header header pIndex cIndex Mask buffer
© Copyright - TTNR Labs Limited 2018 20
False Sharing
header header pIndex cIndex Mask buffer
© Copyright - TTNR Labs Limited 2018 21
Pad to resolve...
??? header header pad12 pad13 pad14 pad15 pad16
pad17 pad20 pad21 pad22 pad23 pad24 pad25 pad26
pIndex pMask pBuffer pad10 pad11 pad12 pad13 pad14
pad15 pad16 pad17 pad20 pad21 pad22 pad23 pad24
pad25 pad26 pad27 cIndex cMask cBuffer pad10 pad11
© Copyright - TTNR Labs Limited 2018 22
Alternatives To Method?
●
Rely on field order
●
@Contended (field/class level, -XX:-Restrict)
●
Go Offheap
© Copyright - TTNR Labs Limited 2018 23
What would be nice?
@ForceLayout(align=64) // state alignment
preference
Class X {
long field1; // still type aligned
byte[120] pad; // allocate presized array inline
long field2;
}
© Copyright - TTNR Labs Limited 2018
24
Unsafe
© Copyright - TTNR Labs Limited 2018 25
Looks safe enough...
abstract class ...ProducerIndexFields<E> extends SpscArrayQueueL1Pad<E>
{
final static long P_INDEX_OFFSET =
fieldOffset(...Fields.class, "producerIndex");
private volatile long producerIndex;
protected long producerLimit;
long lvProducerIndex() {
return producerIndex;
}
long lpProducerIndex() {
return UNSAFE.getLong(this, P_INDEX_OFFSET);
}
void soProducerIndex(long newValue) {
UNSAFE.putOrderedLong(this, P_INDEX_OFFSET, newValue);
}
}
© Copyright - TTNR Labs Limited 2018 26
Why do we need Unsafe?
●
Better than a poke in the eye with
a sharp stick?
●
Performance?
●
Compatibility????!??!?!?!?
© Copyright - TTNR Labs Limited 2018 27
Unsafe for Compatibility?
●
Just Works!™ (JDK6,7,8,11)
●
Shouldn’t we use VarHandle?
© Copyright - TTNR Labs Limited 2018 28
Unsafe vs. A*FU (pre-8u101)
With Unsafe : ~950 ns/op
With A*FU@7 : ~1350 ns/op
* Tested with JDK 7u80 as an example
© Copyright - TTNR Labs Limited 2018 29
Unsafe vs. A*FU (post 8u101/11)
With Unsafe : ~950 ns/op
With A*FU@8 : ~1050 ns/op
With A*FU@11 : ~1150 ns/op
See: https://shipilev.net/blog/2015/faster-atomic-fu/
© Copyright - TTNR Labs Limited 2018 30
Unsafe vs VarHandles (>=11)
With Unsafe : ~950 ns/op
With VarHandle : ~1150 ns/op
© Copyright - TTNR Labs Limited 2018 31
Unsafe for performance?
With Unsafe@7,8,11 : ~950 ns/op
With A*FU@7 : ~1300 ns/op
With A*FU@8 : ~1050 ns/op
With A*FU@11 : ~1150 ns/op
With VarHandle@11 : ~1150 ns/op
© Copyright - TTNR Labs Limited 2018 32
Unsafe vs Alternatives
●
Pre-JDK11?
– Volatile
– AtomicBoolean/Integer/Long/*Array
– Atomic*FieldUpdater
●
JDK11+?
– Should move to VarHandle?
– Multi-version jars
●
Offheap? GO UNSAFE!!!
© Copyright - TTNR Labs Limited 2018
33
Concurrency Specialization
© Copyright - TTNR Labs Limited 2018 34
Generic vs. Specialized
●
JDK?
– MPMC
– Lock based (ArrayBlockingQ/LinkedBlockingQ)
– Lock-less (ConcurrentLinkedQ)
– Allocation?
●
JCTools?
– MPMC/SPMC/MPSC/SPSC
– Array backed/Linked/both!
– Lock-less/lock-free/wait-free
– Zero allocation options
© Copyright - TTNR Labs Limited 2018 35
Generic vs. Specialized
●
Load: plain, opaque, volatile
●
Store: plain, ordered, volatile
●
compareAndSwap
●
getAndAdd
© Copyright - TTNR Labs Limited 2018 36
Single Writer
●
Load: plain/volatile
●
Store: ordered
© Copyright - TTNR Labs Limited 2018 37
Multi-Writer
●
Load: plain/volatile
●
Store: getAndAdd/compareAndSwap
© Copyright - TTNR Labs Limited 2018 38
Generic vs. Specialized
SPSC – 950 ns/op
MPSC – 6500 ns/op
MPMC – 5500 ns/op
CLQ – 13800 ns/op
ABQ – 36000 ns/op
© Copyright - TTNR Labs Limited 2018 39
Lock less/Lock free/Wait free
SPSC – 950 ns/op – WF
MPSC – 5000/6500 ns/op – LF/LL
MPMC – 5000/5500 ns/op – LF/LL
CLQ – 13800 ns/op – LL
ABQ – 36000 ns/op – Locks...
© Copyright - TTNR Labs Limited 2018 40
Allocations
CLQ – 2400b per op(100 messages)
LBQ – ~3000b
ABQ – ?
© Copyright - TTNR Labs Limited 2018 41
Allocations
CLQ – 2400b per op(100 messages)
LBQ – ~3000b
ABQ – ~1500b!!!
JCTools array backed - 0b
© Copyright - TTNR Labs Limited 2018
42
Helping the Compiler
© Copyright - TTNR Labs Limited 2018 43
Const vs Final
private final long size = <some power of 2>;
long offset = REF_ARRAY_BASE +
((consumerIndex % size) *
REF_ELEMENT_SCALE);
long offset = REF_ARRAY_BASE +
((consumerIndex & (size-1)) *
REF_ELEMENT_SCALE);
© Copyright - TTNR Labs Limited 2018 44
Const vs Final
long offset = REF_ARRAY_BASE +
((consumerIndex & (size-1)) *
REF_ELEMENT_SCALE);
long offset = REF_ARRAY_BASE +
((consumerIndex & mask) <<
REF_ELEMENT_SHIFT);
© Copyright - TTNR Labs Limited 2018 45
Const vs Final
long offset = REF_ARRAY_BASE +
((consumerIndex & mask) <<
REF_ELEMENT_SHIFT);
long offset = calcElementOffset(consumerIndex, mask);
© Copyright - TTNR Labs Limited 2018 46
Loads Across an Volatile Load
public E poll()
{
long consumerIndex = lpConsumerIndex();
E[] buffer = this.buffer;
long offset = calcElementOffset(consumerIndex, mask);
E e = (E) UNSAFE.getObjectVolatile(buffer, offset);
if (null == e)
return null; // EMPTY
UNSAFE.putOrderedObject(buffer, offset, null);
soConsumerIndex(consumerIndex + 1);
return e;
}
© Copyright - TTNR Labs Limited 2018
47
A Novel Algorithm
© Copyright - TTNR Labs Limited 2018 48
Typical Offerings
●
Array backed
– Pre-allocated, fixed sized
●
Linked
– Allocate cell per element
– No pre-allocation
– Bounded/Unbounded
© Copyright - TTNR Labs Limited 2018 49
Actor Use Case
●
Lots of queues
●
Some full & busy
●
Most empty
●
MPSC
© Copyright - TTNR Labs Limited 2018 50
Linked Array Queues?
●
Middle Ground?
●
Allocate a multi-element ‘cell/chunk’
●
Stay in ‘cell/chunk’? → no allocation
●
Resizing on the fly
© Copyright - TTNR Labs Limited 2018 51
Challenges?
●
Lock less algo?
●
Linking new array
●
Sizing
© Copyright - TTNR Labs Limited 2018 52
Hanging on a Bit
●
Steal producerIndex parity bit
– Fixup code to match
●
CAS parity to block ALL other producers
– Let them spin
© Copyright - TTNR Labs Limited 2018 53
JUMP indicator
●
Notify consumer needs to jump
●
Use extra cell in array as next pointer
© Copyright - TTNR Labs Limited 2018 54
Performance?
●
Slightly slower than array backed MPSC
●
Very similar throughput
●
Allocates as needed:
– Stay in chunk → no allocation
– Growable → allocate until capacity
– Chunked → bounded, fixed chunks
– Unbounded → blow yer heap, in fixed increments!
© Copyright - TTNR Labs Limited 2018
55
In Closing...
© Copyright - TTNR Labs Limited 2018
56
Thanks!
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
jctools-algorithms-optimization

Weitere ähnliche Inhalte

Ähnlich wie Novel Algos and Optimizations in JCTools Concurrent Queues

Scaling Massive Elasticsearch Clusters
Scaling Massive Elasticsearch ClustersScaling Massive Elasticsearch Clusters
Scaling Massive Elasticsearch Clusters
Sematext Group, Inc.
 

Ähnlich wie Novel Algos and Optimizations in JCTools Concurrent Queues (20)

Docker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic StackDocker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic Stack
 
Getting up to speed with Kafka Connect: from the basics to the latest feature...
Getting up to speed with Kafka Connect: from the basics to the latest feature...Getting up to speed with Kafka Connect: from the basics to the latest feature...
Getting up to speed with Kafka Connect: from the basics to the latest feature...
 
Scaling Massive Elasticsearch Clusters
Scaling Massive Elasticsearch ClustersScaling Massive Elasticsearch Clusters
Scaling Massive Elasticsearch Clusters
 
Join FIWARE Lab
Join FIWARE LabJoin FIWARE Lab
Join FIWARE Lab
 
MySQL Optimizer Cost Model
MySQL Optimizer Cost ModelMySQL Optimizer Cost Model
MySQL Optimizer Cost Model
 
Montreal OpenStack Q3-2017 MeetUp
Montreal OpenStack Q3-2017 MeetUpMontreal OpenStack Q3-2017 MeetUp
Montreal OpenStack Q3-2017 MeetUp
 
WSO2 Product Release Webinar: WSO2 Complex Event Processor 4.0
WSO2 Product Release Webinar: WSO2 Complex Event Processor 4.0WSO2 Product Release Webinar: WSO2 Complex Event Processor 4.0
WSO2 Product Release Webinar: WSO2 Complex Event Processor 4.0
 
Vitaliy Makogon: Migration to ivy. Angular component libraries with IVY support.
Vitaliy Makogon: Migration to ivy. Angular component libraries with IVY support.Vitaliy Makogon: Migration to ivy. Angular component libraries with IVY support.
Vitaliy Makogon: Migration to ivy. Angular component libraries with IVY support.
 
20180921_DOAG_BigDataDays_OracleSpatialandPython_kpatenge
20180921_DOAG_BigDataDays_OracleSpatialandPython_kpatenge20180921_DOAG_BigDataDays_OracleSpatialandPython_kpatenge
20180921_DOAG_BigDataDays_OracleSpatialandPython_kpatenge
 
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
 
Tajo_Meetup_20141120
Tajo_Meetup_20141120Tajo_Meetup_20141120
Tajo_Meetup_20141120
 
[Rakuten TechConf2014] [C-5] Ichiba Architecture on ExaLogic
[Rakuten TechConf2014] [C-5] Ichiba Architecture on ExaLogic[Rakuten TechConf2014] [C-5] Ichiba Architecture on ExaLogic
[Rakuten TechConf2014] [C-5] Ichiba Architecture on ExaLogic
 
Alan Pope, Sebastian Spaink [InfluxData] | Data Collection 101 | InfluxDays N...
Alan Pope, Sebastian Spaink [InfluxData] | Data Collection 101 | InfluxDays N...Alan Pope, Sebastian Spaink [InfluxData] | Data Collection 101 | InfluxDays N...
Alan Pope, Sebastian Spaink [InfluxData] | Data Collection 101 | InfluxDays N...
 
2015 Java update and roadmap, JUG sevilla
2015  Java update and roadmap, JUG sevilla2015  Java update and roadmap, JUG sevilla
2015 Java update and roadmap, JUG sevilla
 
20190423 meetup japan_public
20190423 meetup japan_public20190423 meetup japan_public
20190423 meetup japan_public
 
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
 
OpenTelemetry Introduction
OpenTelemetry Introduction OpenTelemetry Introduction
OpenTelemetry Introduction
 
Interactive Java Support to your tool -- The JShell API and Architecture
Interactive Java Support to your tool -- The JShell API and ArchitectureInteractive Java Support to your tool -- The JShell API and Architecture
Interactive Java Support to your tool -- The JShell API and Architecture
 
OpenStack API's and WSGI
OpenStack API's and WSGIOpenStack API's and WSGI
OpenStack API's and WSGI
 
OpenStack Ottawa Meetup - October 2018
OpenStack Ottawa Meetup - October 2018OpenStack Ottawa Meetup - October 2018
OpenStack Ottawa Meetup - October 2018
 

Mehr von C4Media

Mehr von C4Media (20)

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live Video
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy Mobile
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java Applications
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No Keeper
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like Owners
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate Guide
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Novel Algos and Optimizations in JCTools Concurrent Queues

  • 1. © Copyright - TTNR Labs Limited 2018 1 JCTools: Pit Of Performance Nitsan Wakart/@nitsanw
  • 2. InfoQ.com: News & Community Site Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ jctools-algorithms-optimization • Over 1,000,000 software developers, architects and CTOs read the site world- wide every month • 250,000 senior developers subscribe to our weekly newsletter • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • 2 dedicated podcast channels: The InfoQ Podcast, with a focus on Architecture and The Engineering Culture Podcast, with a focus on building • 96 deep dives on innovative topics packed as downloadable emags and minibooks • Over 40 new content items per week
  • 3. Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide Presented at QCon London www.qconlondon.com
  • 4. © Copyright - TTNR Labs Limited 2018 2 You Look Familiar...
  • 5. © Copyright - TTNR Labs Limited 2018 3 Why me? ● Main JCTools contributor ● Performance Eng. ● Find me on: – GitHub : nitsanw – Blog : psy-lob-saw.blogspot.com – Twitter : nitsanw ● Also: Cape Town JUG Organizer
  • 6. © Copyright - TTNR Labs Limited 2018 4 JCTools? ● Concurrent Queues: – SPSC/MPSC/SPMC/MPMC – Linked/Array/LinkedArray/Compound ● And MOAR!!!
  • 7. © Copyright - TTNR Labs Limited 2018 5 What is this about? ● Optimizations as found in the code ● Design pressures/choices ● Novel algorithm: MPSC linked queues
  • 8. © Copyright - TTNR Labs Limited 2018 6 From the codebase... ● Layout ● Unsafe ● Concurrency specialization ● Bits and bobs
  • 9. © Copyright - TTNR Labs Limited 2018 7 Layout abstract class ...L1Pad<E> extends ...ColdField<E> { long p00,p01,p02,p03,p04,p05,p06,p07, p08,p09,p10,p11,p12,p13,p14; } abstract class ...ProducerIndexFields<E> extends ...L1Pad<E> { private volatile long producerIndex; protected long producerLimit; } abstract class ...L2Pad<E> extends ...ProducerIndexFields<E> { long p00,p01,p02,p03,p04,p05,p06,p07, p08,p09,p10,p11,p12,p13,p14; }
  • 10. © Copyright - TTNR Labs Limited 2018 8 Layout abstract class ...L1Pad<E> extends ...ColdField<E> { long p00,p01,p02,p03,p04,p05,p06,p07, p08,p09,p10,p11,p12,p13,p14; } abstract class ...ProducerIndexFields<E> extends ...L1Pad<E> { private volatile long producerIndex; protected long producerLimit; } abstract class ...L2Pad<E> extends ...ProducerIndexFields<E> { long p00,p01,p02,p03,p04,p05,p06,p07, p08,p09,p10,p11,p12,p13,p14; }
  • 11. © Copyright - TTNR Labs Limited 2018 9 Measure with/with out: With padding: ~950 ns/op Sans padding: ~5000 ns/op *QueueBurst, 100 messages benchmark
  • 12. © Copyright - TTNR Labs Limited 2018 10 Java Object Layout ● 8b aligned ● Object header (8/12/16b) ● Sub-classes fields after parent fields ● Field order within class can change
  • 13. © Copyright - TTNR Labs Limited 2018 11 Let’s use JOL!!! # Running 64-bit HotSpot VM. # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING | Compressed references base/shifts are guessed by the experiment! # WARNING | Therefore, computed addresses are just guesses, and ARE NOT RELIABLE. # WARNING | Make sure to attach Serviceability Agent to get the reliable addresses. # Objects are 8 bytes aligned. # Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes] # Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes] Instantiated the sample instance via public org.jctools.queues.FFBuffer(int) org.jctools.queues.SpscArrayQueue object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 4 (object header) 01 00 00 00 (00000001 00000000 00000000 00000000) (1) 4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0) 8 4 (object header) 37 28 01 f8 (00110111 00101000 00000001 11111000) (-134141897) 12 4 (alignment/padding gap) 16 8 long ConcurrentCircularArrayQueueL0Pad.p00 0 24 8 long ConcurrentCircularArrayQueueL0Pad.p01 0 32 8 long ConcurrentCircularArrayQueueL0Pad.p02 0 40 8 long ConcurrentCircularArrayQueueL0Pad.p03 0 48 8 long ConcurrentCircularArrayQueueL0Pad.p04 0 56 8 long ConcurrentCircularArrayQueueL0Pad.p05 0 64 8 long ConcurrentCircularArrayQueueL0Pad.p06 0 72 8 long ConcurrentCircularArrayQueueL0Pad.p07 0 80 8 long ConcurrentCircularArrayQueueL0Pad.p08 0 88 8 long ConcurrentCircularArrayQueueL0Pad.p09 0 96 8 long ConcurrentCircularArrayQueueL0Pad.p10 0 104 8 long ConcurrentCircularArrayQueueL0Pad.p11 0 112 8 long ConcurrentCircularArrayQueueL0Pad.p12 0 120 8 long ConcurrentCircularArrayQueueL0Pad.p13 0 128 8 long ConcurrentCircularArrayQueueL0Pad.p14 0 136 8 long ConcurrentCircularArrayQueue.mask 3 144 4 java.lang.Object[] ConcurrentCircularArrayQueue.buffer [null, null, null, null] 148 4 int SpscArrayQueueColdField.lookAheadStep 1 152 8 long SpscArrayQueueL1Pad.p00 0 160 8 long SpscArrayQueueL1Pad.p01 0 168 8 long SpscArrayQueueL1Pad.p02 0 176 8 long SpscArrayQueueL1Pad.p03 0 184 8 long SpscArrayQueueL1Pad.p04 0 192 8 long SpscArrayQueueL1Pad.p05 0 200 8 long SpscArrayQueueL1Pad.p06 0 208 8 long SpscArrayQueueL1Pad.p07 0 216 8 long SpscArrayQueueL1Pad.p08 0 224 8 long SpscArrayQueueL1Pad.p09 0 232 8 long SpscArrayQueueL1Pad.p10 0 240 8 long SpscArrayQueueL1Pad.p11 0 248 8 long SpscArrayQueueL1Pad.p12 0 256 8 long SpscArrayQueueL1Pad.p13 0 264 8 long SpscArrayQueueL1Pad.p14 0 272 8 long SpscArrayQueueProducerIndexFields.producerIndex 0 280 8 long SpscArrayQueueProducerIndexFields.producerLimit 0 288 8 long SpscArrayQueueL2Pad.p00 0 296 8 long SpscArrayQueueL2Pad.p01 0 304 8 long SpscArrayQueueL2Pad.p02 0 312 8 long SpscArrayQueueL2Pad.p03 0 320 8 long SpscArrayQueueL2Pad.p04 0 328 8 long SpscArrayQueueL2Pad.p05 0 336 8 long SpscArrayQueueL2Pad.p06 0 344 8 long SpscArrayQueueL2Pad.p07 0 352 8 long SpscArrayQueueL2Pad.p08 0 360 8 long SpscArrayQueueL2Pad.p09 0 368 8 long SpscArrayQueueL2Pad.p10 0 376 8 long SpscArrayQueueL2Pad.p11 0 384 8 long SpscArrayQueueL2Pad.p12 0 392 8 long SpscArrayQueueL2Pad.p13 0 400 8 long SpscArrayQueueL2Pad.p14 0 408 8 long SpscArrayQueueConsumerIndexField.consumerIndex 0 416 8 long SpscArrayQueueL3Pad.p00 0 424 8 long SpscArrayQueueL3Pad.p01 0 432 8 long SpscArrayQueueL3Pad.p02 0 440 8 long SpscArrayQueueL3Pad.p03 0 448 8 long SpscArrayQueueL3Pad.p04 0 456 8 long SpscArrayQueueL3Pad.p05 0 464 8 long SpscArrayQueueL3Pad.p06 0 472 8 long SpscArrayQueueL3Pad.p07 0 480 8 long SpscArrayQueueL3Pad.p08 0 488 8 long SpscArrayQueueL3Pad.p09 0 496 8 long SpscArrayQueueL3Pad.p10 0 504 8 long SpscArrayQueueL3Pad.p11 0 512 8 long SpscArrayQueueL3Pad.p12 0 520 8 long SpscArrayQueueL3Pad.p13 0 528 8 long SpscArrayQueueL3Pad.p14 0 Instance size: 536 bytes
  • 14. © Copyright - TTNR Labs Limited 2018 12 Let’s use JOL!!! # Running 64-bit HotSpot VM. # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # Objects are 8 bytes aligned. # Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes] # Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
  • 15. © Copyright - TTNR Labs Limited 2018 13 Let’s use JOL!!! org.jctools.queues.SpscArrayQueue object internals: OFFSET SIZE TYPE DESCRIPTION 0 12 (object header) 12 4 (alignment/padding gap) 16 120 long ConcurrentCircularArrayQueueL0Pad.p00 [... p14] 136 8 long ConcurrentCircularArrayQueue.mask 144 4 java.lang.Object[] ConcurrentCircularArrayQueue.buffer 148 4 int SpscArrayQueueColdField.lookAheadStep 152 120 long SpscArrayQueueL1Pad.p00 [... p14] 272 8 long SpscArrayQueueProducerIndexFields.producerIndex 280 8 long SpscArrayQueueProducerIndexFields.producerLimit 288 120 long SpscArrayQueueL2Pad.p00 [... p14] 408 8 long SpscArrayQueueConsumerIndexField.consumerIndex 416 120 long SpscArrayQueueL3Pad.p00 [... p14] Instance size: 536 bytes
  • 16. © Copyright - TTNR Labs Limited 2018 14 Let’s use JOL!!! org.jctools.queues.SpscArrayQueue object internals: OFFSET SIZE TYPE DESCRIPTION 0 12 (object header) 12 4 (alignment/padding gap) 16 120 long ConcurrentCircularArrayQueueL0Pad.p00 [... p14] 136 8 long ConcurrentCircularArrayQueue.mask 144 4 java.lang.Object[] ConcurrentCircularArrayQueue.buffer 148 4 int SpscArrayQueueColdField.lookAheadStep 152 120 long SpscArrayQueueL1Pad.p00 [... p14] 272 8 long SpscArrayQueueProducerIndexFields.producerIndex 280 8 long SpscArrayQueueProducerIndexFields.producerLimit 288 120 long SpscArrayQueueL2Pad.p00 [... p14] 408 8 long SpscArrayQueueConsumerIndexField.consumerIndex 416 120 long SpscArrayQueueL3Pad.p00 [... p14] Instance size: 536 bytes
  • 17. © Copyright - TTNR Labs Limited 2018 15 Let’s use JOL!!! org.jctools.queues.SpscArrayQueue object internals: OFFSET SIZE TYPE DESCRIPTION 0 12 (object header) 12 4 (alignment/padding gap) 16 120 long ConcurrentQueueL0Pad.p00 [... p14] 136 8 long ConcurrentCircularArrayQueue.mask 144 4 java.lang.Object[] ConcurrentCircularArrayQueue.buffer 148 4 int SpscArrayQueueColdField.lookAheadStep 152 120 long SpscArrayQueueL1Pad.p00 [... p14] 272 8 long SpscArrayQueueProducerIndexFields.producerIndex 280 8 long SpscArrayQueueProducerIndexFields.producerLimit 288 120 long SpscArrayQueueL2Pad.p00 [... p14] 408 8 long SpscArrayQueueConsumerIndexField.consumerIndex 416 120 long SpscArrayQueueL3Pad.p00 [... p14] Instance size: 536 bytes
  • 18. © Copyright - TTNR Labs Limited 2018 16 Let’s use JOL!!! org.jctools.queues.SpscArrayQueue object internals: OFFSET SIZE TYPE DESCRIPTION 0 12 (object header) 12 4 (alignment/padding gap) 16 120 long ConcurrentCircularArrayQueueL0Pad.p00 [... p14] 136 8 long ConcurrentCircularArrayQueue.mask 144 4 java.lang.Object[] ConcurrentCircularArrayQueue.buffer 148 4 int SpscArrayQueueColdField.lookAheadStep 152 120 long SpscArrayQueueL1Pad.p00 [... p14] 272 8 long SpscArrayQueueProducerIndexFields.producerIndex 280 8 long SpscArrayQueueProducerIndexFields.producerLimit 288 120 long SpscArrayQueueL2Pad.p00 [... p14] 408 8 long SpscArrayQueueConsumerIndexField.consumerIndex 416 120 long SpscArrayQueueL3Pad.p00 [... p14] Instance size: 536 bytes
  • 19. © Copyright - TTNR Labs Limited 2018 17 Why bother with layout? ● False sharing ● Minimize load misses ● Offheap? – Atomicity – Alignment requirements (line/page/sector) – Different method (see Aeron)
  • 20. © Copyright - TTNR Labs Limited 2018 18 Sharing? header header pIndex cIndex mask buffer
  • 21. © Copyright - TTNR Labs Limited 2018 19 False Sharing header header pIndex cIndex Mask buffer
  • 22. © Copyright - TTNR Labs Limited 2018 20 False Sharing header header pIndex cIndex Mask buffer
  • 23. © Copyright - TTNR Labs Limited 2018 21 Pad to resolve... ??? header header pad12 pad13 pad14 pad15 pad16 pad17 pad20 pad21 pad22 pad23 pad24 pad25 pad26 pIndex pMask pBuffer pad10 pad11 pad12 pad13 pad14 pad15 pad16 pad17 pad20 pad21 pad22 pad23 pad24 pad25 pad26 pad27 cIndex cMask cBuffer pad10 pad11
  • 24. © Copyright - TTNR Labs Limited 2018 22 Alternatives To Method? ● Rely on field order ● @Contended (field/class level, -XX:-Restrict) ● Go Offheap
  • 25. © Copyright - TTNR Labs Limited 2018 23 What would be nice? @ForceLayout(align=64) // state alignment preference Class X { long field1; // still type aligned byte[120] pad; // allocate presized array inline long field2; }
  • 26. © Copyright - TTNR Labs Limited 2018 24 Unsafe
  • 27. © Copyright - TTNR Labs Limited 2018 25 Looks safe enough... abstract class ...ProducerIndexFields<E> extends SpscArrayQueueL1Pad<E> { final static long P_INDEX_OFFSET = fieldOffset(...Fields.class, "producerIndex"); private volatile long producerIndex; protected long producerLimit; long lvProducerIndex() { return producerIndex; } long lpProducerIndex() { return UNSAFE.getLong(this, P_INDEX_OFFSET); } void soProducerIndex(long newValue) { UNSAFE.putOrderedLong(this, P_INDEX_OFFSET, newValue); } }
  • 28. © Copyright - TTNR Labs Limited 2018 26 Why do we need Unsafe? ● Better than a poke in the eye with a sharp stick? ● Performance? ● Compatibility????!??!?!?!?
  • 29. © Copyright - TTNR Labs Limited 2018 27 Unsafe for Compatibility? ● Just Works!™ (JDK6,7,8,11) ● Shouldn’t we use VarHandle?
  • 30. © Copyright - TTNR Labs Limited 2018 28 Unsafe vs. A*FU (pre-8u101) With Unsafe : ~950 ns/op With A*FU@7 : ~1350 ns/op * Tested with JDK 7u80 as an example
  • 31. © Copyright - TTNR Labs Limited 2018 29 Unsafe vs. A*FU (post 8u101/11) With Unsafe : ~950 ns/op With A*FU@8 : ~1050 ns/op With A*FU@11 : ~1150 ns/op See: https://shipilev.net/blog/2015/faster-atomic-fu/
  • 32. © Copyright - TTNR Labs Limited 2018 30 Unsafe vs VarHandles (>=11) With Unsafe : ~950 ns/op With VarHandle : ~1150 ns/op
  • 33. © Copyright - TTNR Labs Limited 2018 31 Unsafe for performance? With Unsafe@7,8,11 : ~950 ns/op With A*FU@7 : ~1300 ns/op With A*FU@8 : ~1050 ns/op With A*FU@11 : ~1150 ns/op With VarHandle@11 : ~1150 ns/op
  • 34. © Copyright - TTNR Labs Limited 2018 32 Unsafe vs Alternatives ● Pre-JDK11? – Volatile – AtomicBoolean/Integer/Long/*Array – Atomic*FieldUpdater ● JDK11+? – Should move to VarHandle? – Multi-version jars ● Offheap? GO UNSAFE!!!
  • 35. © Copyright - TTNR Labs Limited 2018 33 Concurrency Specialization
  • 36. © Copyright - TTNR Labs Limited 2018 34 Generic vs. Specialized ● JDK? – MPMC – Lock based (ArrayBlockingQ/LinkedBlockingQ) – Lock-less (ConcurrentLinkedQ) – Allocation? ● JCTools? – MPMC/SPMC/MPSC/SPSC – Array backed/Linked/both! – Lock-less/lock-free/wait-free – Zero allocation options
  • 37. © Copyright - TTNR Labs Limited 2018 35 Generic vs. Specialized ● Load: plain, opaque, volatile ● Store: plain, ordered, volatile ● compareAndSwap ● getAndAdd
  • 38. © Copyright - TTNR Labs Limited 2018 36 Single Writer ● Load: plain/volatile ● Store: ordered
  • 39. © Copyright - TTNR Labs Limited 2018 37 Multi-Writer ● Load: plain/volatile ● Store: getAndAdd/compareAndSwap
  • 40. © Copyright - TTNR Labs Limited 2018 38 Generic vs. Specialized SPSC – 950 ns/op MPSC – 6500 ns/op MPMC – 5500 ns/op CLQ – 13800 ns/op ABQ – 36000 ns/op
  • 41. © Copyright - TTNR Labs Limited 2018 39 Lock less/Lock free/Wait free SPSC – 950 ns/op – WF MPSC – 5000/6500 ns/op – LF/LL MPMC – 5000/5500 ns/op – LF/LL CLQ – 13800 ns/op – LL ABQ – 36000 ns/op – Locks...
  • 42. © Copyright - TTNR Labs Limited 2018 40 Allocations CLQ – 2400b per op(100 messages) LBQ – ~3000b ABQ – ?
  • 43. © Copyright - TTNR Labs Limited 2018 41 Allocations CLQ – 2400b per op(100 messages) LBQ – ~3000b ABQ – ~1500b!!! JCTools array backed - 0b
  • 44. © Copyright - TTNR Labs Limited 2018 42 Helping the Compiler
  • 45. © Copyright - TTNR Labs Limited 2018 43 Const vs Final private final long size = <some power of 2>; long offset = REF_ARRAY_BASE + ((consumerIndex % size) * REF_ELEMENT_SCALE); long offset = REF_ARRAY_BASE + ((consumerIndex & (size-1)) * REF_ELEMENT_SCALE);
  • 46. © Copyright - TTNR Labs Limited 2018 44 Const vs Final long offset = REF_ARRAY_BASE + ((consumerIndex & (size-1)) * REF_ELEMENT_SCALE); long offset = REF_ARRAY_BASE + ((consumerIndex & mask) << REF_ELEMENT_SHIFT);
  • 47. © Copyright - TTNR Labs Limited 2018 45 Const vs Final long offset = REF_ARRAY_BASE + ((consumerIndex & mask) << REF_ELEMENT_SHIFT); long offset = calcElementOffset(consumerIndex, mask);
  • 48. © Copyright - TTNR Labs Limited 2018 46 Loads Across an Volatile Load public E poll() { long consumerIndex = lpConsumerIndex(); E[] buffer = this.buffer; long offset = calcElementOffset(consumerIndex, mask); E e = (E) UNSAFE.getObjectVolatile(buffer, offset); if (null == e) return null; // EMPTY UNSAFE.putOrderedObject(buffer, offset, null); soConsumerIndex(consumerIndex + 1); return e; }
  • 49. © Copyright - TTNR Labs Limited 2018 47 A Novel Algorithm
  • 50. © Copyright - TTNR Labs Limited 2018 48 Typical Offerings ● Array backed – Pre-allocated, fixed sized ● Linked – Allocate cell per element – No pre-allocation – Bounded/Unbounded
  • 51. © Copyright - TTNR Labs Limited 2018 49 Actor Use Case ● Lots of queues ● Some full & busy ● Most empty ● MPSC
  • 52. © Copyright - TTNR Labs Limited 2018 50 Linked Array Queues? ● Middle Ground? ● Allocate a multi-element ‘cell/chunk’ ● Stay in ‘cell/chunk’? → no allocation ● Resizing on the fly
  • 53. © Copyright - TTNR Labs Limited 2018 51 Challenges? ● Lock less algo? ● Linking new array ● Sizing
  • 54. © Copyright - TTNR Labs Limited 2018 52 Hanging on a Bit ● Steal producerIndex parity bit – Fixup code to match ● CAS parity to block ALL other producers – Let them spin
  • 55. © Copyright - TTNR Labs Limited 2018 53 JUMP indicator ● Notify consumer needs to jump ● Use extra cell in array as next pointer
  • 56. © Copyright - TTNR Labs Limited 2018 54 Performance? ● Slightly slower than array backed MPSC ● Very similar throughput ● Allocates as needed: – Stay in chunk → no allocation – Growable → allocate until capacity – Chunked → bounded, fixed chunks – Unbounded → blow yer heap, in fixed increments!
  • 57. © Copyright - TTNR Labs Limited 2018 55 In Closing...
  • 58. © Copyright - TTNR Labs Limited 2018 56 Thanks!
  • 59. Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ jctools-algorithms-optimization