Building Scalable Producer-Consumer Pools based on Elimination-Diraction Trees

Building Scalable Producer-Consumer
Pools based on
Elimination-Diraction Trees

Yehuda Afek and Guy Korland and Maria
Natanzon and Nir Shavit

The Pool
Producer-consumer pools, that is, collections of
unordered objects or tasks, are a fundamental
element of modern multiprocessor software and a
target of extensive research and development
Get( )

P1 Put(x)

.
.
P2

C1

.
.

C2

Put(y)

Get( )

Pn Put(z)

Get( )

pool

Cn

ED-Tree Pool
We present the ED-Tree, a distributed pool
structure based on a combination of the
elimination-tree and diffracting-tree
paradigms, allowing high degrees of
parallelism with reduced contention

Java JDK6.0:
 SynchronousQueue/Stack

(Lea, Scott, and Shearer)

- pairing

up function without buffering. Producers and consumers wait for
one another
 LinkedBlockingQueue

- Producers put their value and

leave, Consumers wait for a value to become available.
 ConcurrentLinkedQueue

- Producers put their value

and leave, Consumers return null if the pool is empty.

Drawback
All these structures are based on a centralized
structures like a lock-free queue or a stack,
and thus are limited in their scalability: the
head of the stack or queue is a sequential
bottleneck and source of contention.

Some Observations
A

pool does not have to obey neither LIFO or
FIFO semantics.
 Therefore, no centralized structure needed,
to hold the items and to serve producers and
consumers requests.

New approach
ED-Tree: a combined variant of
the diffracting-tree structure (Shavit and Zemach) and
the elimination-tree structure (Shavit and Touitou)
The basic idea:
 Use randomization to distribute the concurrent
requests of threads onto many locations so that they
collide with one another and can exchange values,
thus avoiding using a central place through which all
threads pass.
The result:
 A pool that allows both parallelism and reduced
contention.

A little history
 Both

diffraction and elimination were
presented years ago, and claimed to be
effective through simulation
 However, elimination trees and diffracting
trees were never used to implement real
world structures
 Elimination and diffraction were never
combined in a single data structure

Diffraction trees
A binary tree of objects called balancers [Aspnes-Herlihy-Shavit] with
a single input wire and two output wires

5

4

3

2

1

b

1

3

2

5

4

Threads arrive at a balancer and it repeatedly sends them left and right,
so its top wire always has maximum one more than the bottom one.

Diffraction trees
1

[Shavit-Zemach]

b

b
10

9

8

7

6

5

4

3

2

1

b

9

2

10

3
4

b
b

b

5
6
7

b

8

In any quiescent state (when there are no threads in the tree), the tree
preserves the step property: the output items are balanced out so that the
top leaves outputted at most one more element than the bottom ones, and
there are no gaps.

Diffraction trees
Connect each output wire to a lock free queue
b
b

b

b

b

b

b

To perform a push, threads traverse the balancers from the root to the leaves and
then push the item onto the appropriate queue.
To perform a pop, threads traverse the balancers from the root to the leaves and
then pop from the appropriate queue/block if the queue is empty.

Diffraction trees
Problem:
Each toggle bit is a hot spot
1

1

b
0/1

1

b
0/1
3
3

2

1

b
0/1

0/1
0/1
2
2

b
0/1

b
0/1

b
0/1

2

3

Diffraction trees
Observation:
If an even number of threads pass through a balancer, the
outputs are evenly balanced on the top and bottom wires, but
the balancer's state remains unchanged

The approach:
Add a diffraction array in front of each toggle bit

0/1

Prism Array

toggle bit

Elimination
 At

any point while traversing the tree, if
producer and consumer collide, there is no
need for them to diffract and continue
traversing the tree

 Producer

can hand out his item to the
consumer, and both can leave the tree.

Adding elimination
x

Get( )

1
2
.
.
:
:
k

Put(x)

ok

0/1
0/1

Using elimination-diffraction balancers
Let the array at balancer each be
a diffraction-elimination array:
 If two producer (two consumer) threads meet in the
array, they leave on opposite wires, without a need to
touch the bit, as anyhow it would remain in its original
state.
 If producer and consumer meet, they eliminate,
exchanging items.
 If a producer or consumer call does not manage to
meet another in the array, it toggles the respective bit of
the balancer and moves on.

What about low concurrency
levels?
 We

show that elimination and diffraction
techniques can be combined to work well at
both high and low loads
 To insure good performance in low loads we use
several techniques, making the algorithm adapt
to the current contention level.

Adaptation mechanisms


Use backoff in space:
 Randomly choose a cell in a certain range of the array
 If the cell is busy (already occupied by two threads), increase the range and
repeat.
 Else Spin and wait to collision
 If timed out (no collision)
 Decrease the range and repeat
 If certain amount of timeouts reached, spin on the first cell of the array for a
period, and then move on to the toggle bit and the next level.
 If certain amount of timeouts was reached, don’t try to diffract on any of the
next levels, just go straight to the toggle bit



Each thread remembers the last range it used at the current balancer and next
time starts from this range

Starvation avoidance
 Threads

that failed to eliminate and propagated
all the way to the leaves can wait for a long time
for their requests to complete, while new threads
entering the tree and eliminating finish faster.

 To

avoid starvation we limit the time a thread
can be blocked in the queues before it retries
the whole traversal again.

Implementation
 Each

balancer is composed from
an elimination array, a pair of toggle bits, and
two references one to each of its child nodes.
public class Balancer
{
ToggleBit producerToggle, consumerToggle;
Exchanger[] eliminationArray;
Balancer leftChild , rightChild;
ThreadLocal<Integer> lastSlotRange;
}

Implementation
public class Exchanger
{
AtomicReference<ExchangerPackage> slot;
}
public class ExchangerPackage
{
Object value;
State state ; // WAITING/ELIMINATION/DIFFRACTION,
Type type; // PRODUCER/CONSUMER
}

Implementation


Starting from the root of the tree:
 Enter balancer
 Choose a cell in the array and try to collide with another thread,
using backoff mechanism described earlier.
 If collision with another thread occurred







If both threads are of the same type, leave to the next level balancer
(each to separate direction)
If threads are of different type, exchange values and leave

Else (no collision) use appropriate toggle bit and move to next
level

If one of the leaves reached, go to the appropriate queue and
Insert/Remove an item according to the thread type

Performance evaluation
Sun UltraSPARC T2 Plus multi-core machine.
 2 processors, each with 8 cores
 each core with 8 hardware threads
 64 way parallelism on a processor and 128 way
parallelism across the machine.


Most of the tests were done on one processor. i.e.
max 64 hardware threads




A tree with 3 levels and 8 queues
The queues are
SynchronousBlocking/LinkedBlocking/ConcurrentLinked,
according to the pool specification
b
b

b

b

b

b

b

Synchronous stack of Lea et. Al vs ED synchronous pool

Linked blocking queue vs ED blocking pool

Concurrent linked queue vs ED non blocking pool

Adding a delay between accesses
to the pool
32 consumers, 32 producers

Changing percentage of Consumers vs. total
threads number
64 threads

Building Scalable Producer-Consumer Pools based on Elimination-Diraction Trees

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (6)

Andere mochten auch

Andere mochten auch (6)

Ähnlich wie Building Scalable Producer-Consumer Pools based on Elimination-Diraction Trees

Ähnlich wie Building Scalable Producer-Consumer Pools based on Elimination-Diraction Trees (20)

Mehr von Guy Korland

Mehr von Guy Korland (11)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Building Scalable Producer-Consumer Pools based on Elimination-Diraction Trees

Hinweis der Redaktion