Wait-free data structures on embedded multi-core systems
1. Tobias Fuchs
Evaluation of Task Scheduling
Algorithms and Wait-Free Data
Structures for Embedded Multi-Core
Systems
• Vortrag zur Masterarbeit
• Aufgabensteller: Prof. Dr. Dieter Kranzlmüller
• Betreuer: Dr. Karl Fürlinger (LMU)
Dr. Tobias Schüle (Siemens CT)
• Datum des Vortrags: 05.11.2014
2. Structure of this talk
1. Introduction
1. Motivation
2. Problem Statement and Objectives
2. Wait-free data structures
1. Foundations
2. Pools
3. Queues
4. Stacks
3. Task Scheduling
1. Work stealing
2. Prioritized work stealing in EMBB
4. Conclusion
Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems 2
4. Motivation
Wait-free algorithms
• Strongest possible fault tolerance
• Guarantee progress and upper bound for execution time
Gains:
+ Progress is potentially a formal constraint in real-time
computing
+ Wait-freedom eliminates the classic concurrency problems:
Deadlocks, Priority Inversion, Convoying, Kill-Intolerance
Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems 4
5. Problem statement
State of the art
No suitable wait-free data structures for embedded systems:
• Employing mechanisms such as garbage collection
• Not designed for restricted resources
• No evaluation for latency
Challenges:
- Transforming data structures to wait-free equivalents is
non-trivial, usually from-scratch redesign
- Implementations depend on platform architecture
Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems 5
6. Objectives
1. Review and evaluation of state of the art approaches for
suitability on embedded systems
2. Real-time compliant implementations of wait-free data
structures
3. Definition, implementation and evaluation of suitable
benchmark scenarios for wait-free data structures and
task scheduling algorithms
+ Automated verification derived from semantic definition
Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems 6
8. Progress conditions
Classification of progress
On the Nature of Progress (Herlihy, Shavit 2011)
Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems 8
9. Real-time requirements
Performance priorities on real-time systems
Guarantees on worst-case runtime behavior
Aim for latency / jitter-reduction, neglecting throughput
Avoid non-determinism, as in malloc / new (see: MISRA)
Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems 9
10. Evaluation methodology
Real-time applications are designed to optimize latency
Related work does not evaluate latency, but only mean or
median throughput
Evaluation of worst-case latency is tough:
• In related work, measurements outside of 97.5% confidence
interval are considered outliers and ignored
• These outliers are our data
Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems 10
11. Pools
Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems 11
12. Wait-free data structures:
Pools
Pools
… realize dynamic memory allocation
… while eliminating heap fragmentation
• Fundamental data structure of any concurrent container
• Fixed number of objects in static or automatic memory
• Pools manage concurrent removal and reclamation of
objects
RemoveAny(pool, er) Remove and return element er
Add(pool, e) Add element e back to the pool
Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems 12
13. Pools:
Related work
Related work
Close to none:
• Several lock-free pools, e.g. tree-based
• Wait-free pools: array-based, simple yet inefficient
Why are wait-free pools hard to design?
Common wait-free paradigms require dynamic memory
allocation …
Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems 13
14. Array-based pools
Array-based wait-free pools
• Consists of array holding atomic reservation flags
• Threads traverse reservation array from the beginning
and try to reserve a flag atomically (CAS)
• Index of successfully toggled flag is acquired element index
• Worst-case complexity: O(n)
Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems 14
15. Compartment pool
Wait-free pool with thread-specific compartments
• Array-based pool with additional range of elements that
can only be acquired by a specific thread
• Threads acquire elements from their private compartment
first
Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems 15
16. Wait-free data structures:
Pools - Evaluation
Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems 16
17. Wait-free data structures:
Pools - Evaluation
Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems 17
18. Wait-free data structures:
Pools - Evaluation
Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems 18
19. Queues
Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems 19
20. Queues:
Related work
Related work
Kogan and Petrank presented the first wait-free queue for
multiple enqueuers and dequeuers
Wait-Free Queues With Multiple Enqueuers and Dequeuers (Kogan, Petrank, 2011)
- Implemented in Java
- Relying on garbage collection
- Requires monotonic counter (phase)
Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems 20
21. Kogan-Petrank queue
Adapting the Kogan-Petrank wait-free queue
Redesign helping scheme to remove phase counter
• In original publication, new phase value is greater than all
phases of any announced operation (including non-pending)
Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems 21
22. Kogan-Petrank queue
Adapting the Kogan-Petrank wait-free queue
Redesign helping scheme to remove phase counter
• Modification: Help all other non-pending operations first
• Possibly helping operations that are newer than the thread‘s
own operation
Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems 22
23. Kogan-Petrank queue
Adapting the Kogan-Petrank wait-free queue
Redesign helping scheme to remove phase counter
• Fairness is maintained: all other threads are guaranteed
to help this thread’s operation before engaging in their own
Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems 23
24. Kogan-Petrank queue
Adapting the Kogan-Petrank wait-free queue
Memory reclamation
Hazard pointers scheme typically presented as a solution
Hazard pointers: Safe memory reclamation for lock-free objects (Michael, 2004)
Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems 24
25. Kogan-Petrank queue
Adapting the Kogan-Petrank wait-free queue
Introduce hazard pointers
Step 1: Find upper memory bound for hazard pointers
Step 2: Guard queue nodes using hazard pointers
Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems 25
26. Kogan-Petrank queue
Adapting the Kogan-Petrank wait-free queue
Introduce hazard pointers
Step 2: Guard queue nodes using hazard pointers
Culprit: Guarding is not wait-free
pointer p = node.Next;
// -- possible change of node.Next –
while(hp.GuardPointer(p) && p != node.Next) {
// Release and retry, unbounded number of retries
hp.ReleaseGuard(p);
}
Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems 26
27. Kogan-Petrank queue
Adapting the Kogan-Petrank wait-free queue
Introduce hazard pointers
Step 2: Guard queue nodes using hazard pointers
Culprit: Guarding is not wait-free
Fortunately, retry loops can be avoided in the Kogan-
Petrank queue, but the implementation is not trivial
see implementation at
https://github.com/fuchsto/embb/tree/benchmark/
Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems 27
28. Queues - Evaluation
Queue benchmark scenarios
In addition to scenarios for bag semantics
• Buffer latency
Elements enqueued with current timestamp, difference from
timestamp at dequeue is buffer latency
Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems 28
29. Queues - Evaluation
Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems 29
30. Queues - Evaluation
Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems 30
31. Stacks
Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems 31
32. Stacks:
Related work
Related work
Fatourou presented a wait-free “universal” construction
that is applicable for stacks
Wait-Free Queues With Multiple Enqueuers and Dequeuers (Kogan, Petrank, 2011)
Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems 32
33. Elimination stack
Fatourou’s universal construction SIM
A highly efficient universal construction (Fatourou, 2011)
Principle
• Optimized helping scheme
• Threads apply operations to a local copy of the stack
• Every thread tries to replace the global shared object with
its local copy via CAS
• Only applicable for shared objects with small state
Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems 33
34. Elimination stack
Fatourou’s universal construction SIM
A highly efficient universal construction (Fatourou, 2011)
Elimination
• Push and Pop have reverse semantics:
Push(Pop(stack)) = Pop(Push(stack)) = stack
• Eliminated operations are completed immediately
if they do not alter the object’s state
Significantly improves performance if applicable
Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems 34
35. Elimination stack
Fatourou’s universal construction SIM
A highly efficient universal construction (Fatourou, 2013)
Original version is not suitable for real-time applications:
- ABA problem is prevented using tagged pointers
- Thread-local pools with unbounded capacity
- No deallocation in published algorithm
Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems 35
36. Elimination stack
Fatourou’s universal construction SIM
A highly efficient universal construction (Fatourou, 2013)
Modified version of Fatourou’s stack
- Uses hazard pointers for safe reclamation
- Uses compartment pool with limited capacity
- Employs the elimination scheme from the original
publication
Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems 36
37. Stacks:
Evaluation
Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems 37
38. Stacks:
Evaluation
Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems 38
39. Task scheduling
Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems 39
40. Task Scheduling:
Objectives
Task Scheduling
• Intra-process task scheduling with priority queues
• Low-overhead, fine-grained scheduling of thousands of
small tasks
Priorities:
Focus on low latency and jitter reduction (i.e. predictability),
thus regarding maximum throughput as a secondary
benchmark.
Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems 40
41. Task scheduling:
Work stealing
Work stealing
• One worker thread per
SMP core, no migration
• Tasks passed as &func
• Load-balancing on task
queues
• Many flavors of concrete
implementations
Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems 41
42. Task scheduling:
Work stealing
Work stealing with task priorities
• Extended work-stealing
by queues for every
priority
•
Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems 42
44. Conclusion
Revisiting the objective
• Wait-free implementations of pools, queues and stacks now
available for real-time applications
• Benchmark framework and evaluation tools (R) are
published as open source
• Reproducible evaluation of real-time performance
• Verification tool chain on the way
Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems 44
45. Conclusion
Recommendations
• Wait-free data structures can rival performance of lock-free
implementations
• But are hard to maintain
• Formal wait-freedom is practically not achievable
Employ wait-free data structures for fault-tolerance, not as a
guarantee for critical deadlines
Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems 45
46. Thank You
Source code (data structures, benchmarks, R scripts):
https://github.com/fuchsto/embb/tree/benchmark/
Official development source base of embb:
https://github.com/siemens/embb/tree/development/
Wiki to this thesis:
http://wiki.coreglit.ch
Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems 46