15. Stuff that did not work for various reasons
1. RDBMS
2. Actors Receive Unmarshal Journal Replicate
Business
Logic
Marshall Send
Queue
Queue
Queue
Queue
Queue
Queue
3. SEDA Service / Transaction Processor
4. J2EE
…
15 20.04.12 Fußzeilentext
17. Business
Receive Unmarshal Journal Replicate Marshall Send
Logic
Queue
Queue
Queue
Queue
Queue
Queue
Service / Transaction Processor
18. Size
Linked List Queue
Node Node Node Node
Add Remove
Add
Remove
Array Queue
Cache Line Cache Line
19. Queue as a data structure
Problems with Queues
1. Reading (Take) and Writing (Add) are both write access
=> Write Contention
2. Write Contention solves with Locks
1. Other solutions include Deques
3. Locks lead to context switches to the kernel
1. Context switches lead to CPU cache misses etc.
2. Kernel might use opportunity to do other stuff as well
19
20. Locks
Costs according to LMAX Paper
Method Time in ms
Single Thread 300
Single Thread mit Lock 10.000
Zwei Threads mit Lock 224.000
Single Thread mit CAS 5.700
Zwei Threads mit CAS 30.000
Single Thread/ 4.700
Volatile Write
“Compare And Swap”
Atomic Reference etc. in Java =>
No Context Switch Memory Read/Write Barrier
20
28. HA Node
Publisher
Receiver
Replicator
Marshaller
Journaler
Un-
Marshaller
Output Disruptor
Input Disruptor File System
Jede Stage
kann mehrere Business Logic Handler
Threads haben
28
29. Receiver writes on 31.
Journaler and Replicator read
on 24 and can move up the
Receiver sequence to 30.
Journaler
31 Replicator
24
Un-Marshaller can move beyond
Un- Journaler and Replicator up to
Marshaller 30.
19
18
Business Logic Handler needs
Business Logic to stay behind all others.
Handler
29
39. LMAX Low Level Ideas
1. Simple Code
2. Everything in memory
3. Single threaded per CPU for business logic
4. Business logic has no I/O, I/O is done somewhere else
5. Scheduler “knows” dependencies of handlers
39 20.04.12 Fußzeilentext
40. 6M TPS? How did LMAX do it?
x 10
x 10
3 billions of 1000K+ TPS
instructions
on modern Custom, cache friendly
CPU collections
100K+ TPS
Performance Testing
Clean organized code
Controlled GC
Standard libraries
10K+ TPS Very well modeled
domain
If you don't do anything
stupid
40
44. Sources
“Disruptor: High performance alternative to bounded queues for exchanging
data between concurrent threads”, Martin Thompson, Dave Farley,
Michael Barker, Patricia Gee, Andrew Stewart, 2011
"The LMAX Architecture”, Martin Fowler, 2011
http://martinfowler.com/articles/lmax.html
“How to do 100K+ TPS at less than 1ms latency”, Martin Thompson, Michael
Barker, 2010
44 20.04.12 Fußzeilentext