Java has a solid Memory Model, and there are a couple of excellent libraries for concurrency. When you start working with threads however, pitfalls start appearing - especially if the program is supposed to be fast and correct. This session shows proven solutions for some typical problems, showing how to view program code from a concurrency perspective: Which threads share which data, and how? How to reduce the impact of locks? How to avoid them altogether - and when is that worth it?
2. Premature Optimization is
the root of all evil.
Donald Knuth
We should forget about small
efficiencies, say about 97% of
the time:
Yet we should not pass up our
opportunities in that critical 3%.
4. 1. Java Memory Model
2. Concepts and Paradigms
3. Performance
5. Threads (naive)
● Threads take turns doing work.
● Each Thread does what the source code says.
● When a thread is interrupted, other threads see
the intermediate state.
● Synchronization serves only to make several
changes atomic.
6. Threads (a little less naive)
● There are several CPUs / cores executing
threads really at the same time.
● Each thread does what the source code says.
● Data access goes to RAM.
● When different threads access the same data,
they do that one after the other.
7. Threads: the truth
● Within a single thread, the JVM makes it look
as if the thread does what the source code
says.
● There is a defined ordering between things
happening in different threads if
synchronization prescribes that („happens-
before“).
● And that is all!
11. Correct Synchronization
● No Race Conditions:
● If several threads access the same variable,
● and at least one of them writes,
● then access must be ordered by „happens-before“.
● If that is true for a program, then it will run as if
● all memory access was sequential
● and really went to RAM
● in the order defined by „happens-before“.
12. Example: volatile
● If
● Thread T1 writes a volatile Variable v, and
● Thread T2 then reads v,
● then
● all changes T1 made up to writing v are 'before' all
code in T2 after the read from v.
13. Memory Barriers
● Tools used to implement JMM
● Assembler instructions
● „synchronizing“ CPUs and Caches
● Constrain Reordering
● Potentially expensive
● Direct Cost: Cache Dirtying
● Limit Optimizations
● Especially between processors
● Where does Java use them?
● synchronized, Locks
● volatile
● after constructors (for final fields), …
19. Parallel Execution?
● Formatting can be expensive
● toString(): Callbacks to application code!
● Thread Pool for Logging?
● Message ordering is lost!
● … and doLog() ist not thread safe
21. Concurrency:
Shared Mutable State
● Locks: Blocking
● Read/Write and other optimizations
● Single Worker-Thread with a Message-Queue:
Non-Blocking
● Actors as a variant
● „Lock Free“ has share mutable state
● Every approach has its specific cost!
22. Worker Threads
● Queues
● Bounded / Unbounded
● Blocking / Non-Blocking
● Single / Multi Producers / Consumers
● Optimized for reading or writing (or a mix)
● Special features: Priority, remove(), Batch, …
● Future for results
● .thenAccept(...) / .thenAcceptAsync(...)
23. Which Thread Pool to use?
● Code with side effects
● (usually) ordering is important → Executors.newSingleThreadPool()
● non-blocking without Seiteneffekte
● ForkJoinPool.commonPool()
● blocking
● Specific ExecutorService per context
● Tuning, Monitoring, …
● Divide and conquer big proglems
● Measure to verify assumptions!
● Never ad hoc!
25. Sharing is the root
of all contention
● more precisely: sharing mutable Ressources
● Locks, I/O, …
● Hidden Sharing
● ReadLock: Counter
● UUID.randomUUID()
● volatile: Shared main memory
● Cache Lines (Locality, Poisoning)
● Disk, DVDs, Network
● …
● Solutions: Isolation or Immutability
26. Lock-free Programming
● Shared Mutable State
● Lock Free
● Callers need never wait
● Processing may be delayed however
● e.g. asynchronous decoupling
● Responsive, saves ressources
● Wait Free
● Processing without a delay
● Special algorithms and data structures
● Highly complex; Ongoing research, work in progress
● ConcurrentHashMap, ConcurrentLinkedQueue
27. CAS loop
final AtomicInteger n = new AtomicInteger (0);
int max (int i) {
int prev, next;
do {
prev = n.get();
next = Math.max (prev, i);
} while (! n.compareAndSet (prev, next));
return next;
}
28. Functional Programming
● No Seiteneffekte
● All Objects are immutable, changes create new objects
● not just using Lambdas: JDK collections, Guice, …
● Scala, Clojure; a-foundation
● different style of programming
● efficient copying: partial reuse of objects
● functional algorithms
● State is stable automatically, even concurrently
29. Performance-Tuning
for Concurrency
public class StockExchange {
private final Map<Currency, Double> rates = new HashMap<>();
private final Map<String, Double> pricesInEuro = new HashMap<>();
public void updateRate (Currency currency, double fromEuro) {
rates.put (currency, fromEuro);
}
public void updatePrice (String wkz, double euros) {
pricesInEuro.put (wkz, euros);
}
public double currentPrice (String wkz, Currency currency) {
return pricesInEuro.get (wkz) * rates.get (currency);
}
}
31. Fine-grained Locks:
Collections.synchronizedMap
public class StockExchange {
private final Map<Currency, Double> rates =
Collections.synchronizedMap (new HashMap<>());
private final Map<String, Double> pricesInEuro =
Collections.synchronizedMap (new HashMap<>());
…
}
32. Coarse Locks
public class StockExchange {
…
public synchronized void updateRate (…) {
…
}
public synchronized void updatePrice (…) {
…
}
public synchronized double currentPrice (…) {
…
}
}
33. Functional: Immutable Maps
public class StockExchange {
private final AtomicReference<AMap<Currency, Double>> rates =
new AtomicReference<> (AHashMap.empty ());
…
public void updateRate (Currency currency, double fromEuro) {
AMap<Currency, Double> prev, next;
do {
prev = rates.get ();
next = prev.updated (currency, fromEuro);
}
while (! rates.compareAndSet (prev, next));
}
…
}
35. Queue with a Worker-Thread
public class StockExchange {
private final Map<Currency, Double> rates = new HashMap<>();
…
private final BlockingQueue<Runnable> queue = …;
public StockExchange() {
new Thread(() -> {
while (true) queue.take().run();
}).start();
}
public void updateRate (Currency currency, double fromEuro) {
queue.put (() -> rates.put (currency, fromEuro));
}
…
}
36. Tests for comparison
for (int i=0; i<1_000_000; i++) {
stockExchange.updatePrice ("abc", 1.23);
}
volatile int v=0;
…
for (int i=0; i<1_000_000; i++) {
v=v;
stockExchange.updatePrice ("abc", 1.23);
}
volatile int v=0;
final LinkedList<String> l = new LinkedList<>();
…
for (int i=0; i<1_000_000; i++) {
l.add ("abc");
v=v;
stockExchange.updatePrice (l.remove(), 1.23);
}
40. And the Winner is...
● ConcurrentHashMap
● No consistency requirements, mostly reads, algorithmic access
● Queue with Worker Thread
● Transactionaler access, centralized Event-Queue
●
Data locality in multi-processor systems
● Immutable Map
●
Stable data during read operations
● long running reads
● „lossy“ → fast udate
● Locks
● never particularly fast → prescribed threading and data model
● Granularity: Lock per operation
41. Testing (1): Correctness
● „it wors“ is not enough
● JMM vs. JVM, Hardware, …
● Reviews
● controlled interruptions
● Shotgun
42. Testen (2): Performance
● Reserve time for testing!
● realistic Hardware
● big Hardware
● Multi-Core vs. Multi-Prozessor → HW optimizations for cache exchange
● Scaling effects
● realistic load scenarios (→ Know your load!!! Document assumptions!!!)
● Virtualized hardware
● Ask specific questions – there are many variables
● Pool sizes, cut-off for serial processing
● Hardware sizes: RAM, #Cores, …
● Series of measurements to evaluate alternatives
43. Practice: Local Parallelization
● e.g. searching in a large collection
● Fork/Join examples
●
Streams-API
● Verify improvements
● APIs are simple and invite „ad hoc“ use
● apparent improvements – at least without background load
●
overall CPU load is increased – use only if CPUs would
otherwise be idling!
● Measure, measure, measure: real hardware, real load
scenarios
44. Wrap-Up
● Concurrency is difficult
● Really understand your problems
● Measure, measure, measure!
● OS and hardware influence behavior
● Correctness before performance
● Korrektheit vor Performance
● Möglichst grobe Concurrency
● Share Nothing
45. The End
● Links:
● http://channel9.msdn.com/Shows/Going+Deep/Cpp
-and-Beyond-2012-Herb-Sutter-atomic-Weapons-1-
of-2
● http://github.com/arnohaase/a-foundation