2. Embedded and Parallel Systems Lab2
Paper
Byung-Hyun Yu; Werstein, P.; Purvis, M.; Cranefield, S. ,
“Performance improvement techniques for software distributed
shared memory “
11th International Conference on Parallel and Distributed Systems,
2005. Proceedings. Volume 1, 20-22 July 2005 Page(s):119 - 125
Vol. 1
3. Embedded and Parallel Systems Lab3
Reference
L. Iftode, J.P. Singh and K. Li: "Scope Consistency: A Bridge
between Release Consistency and Entry Consistency," In Proc.
of the 8th Annual ACM Symposium on Parallel Algorithms and
Architectures, 1996.
4. Embedded and Parallel Systems Lab4
Outline
Introduction
Implementation of ScC model
Diff Integration Technique
Dynamic Home Migration
Performance Evaluation Environment
Performance Evaluation
5. Embedded and Parallel Systems Lab5
Introduction
It is more convenient to implement parallel
algorithms by using shared variables
compared to message passing in which a
programmer explicitly sends or receives
data between.
DSM hasn’t been a major attraction to the
parallel computing community due to its
slow performance.
6. Embedded and Parallel Systems Lab6
Introduction
Lazy home-based (LHB)
Scope consistency (ScC)
Diff integration technique which can solve most
diff accumulation problems
A dynamic home migration protocol that solves
the static homes assignment problem in the
original home-based protocol.
To evaluate the techniques, using well know
DSM benchmark applications.
7. Embedded and Parallel Systems Lab7
Implementation of ScC model
The LHB protocol does not send diffs to
home nodes between two consecutive
barriers.
Uses the update protocol during lock
synchronization and the invalidation
protocol for global scope during barrier
synchronization.
9. Embedded and Parallel Systems Lab9
Diff Integration Technique
Twinning occurs before diff application
and not after a write page fault.
In this way, all previous diffs on the same
page made in the same critical section are
preserved and integrated into a single
integrated diff.
11. Embedded and Parallel Systems Lab11
Dynamic Home Migration
The home-based protocol has a weakness when
a home node is allocated for pages that are not
accessed or are less frequently accessed by the
home node compared with other nodes.
General home migration techniques proposed
provide a solution only for single writer DSM
applications
To migrate homes at the time of lock
synchronization (acq & rel)
12. Embedded and Parallel Systems Lab12
Dynamic Home Migration
This paper propose a home migration
technique which can decide optimum
home nodes for multiple writer
applications as well as single writer
applications.
Uses a barrier process in which best home
nodes are piggybacked with other
coherence –related data, thus minimizing
the home finding and data communication
overheads.
14. Embedded and Parallel Systems Lab14
Dynamic Home Migration
1. All nodes record their dirty pages between two
consecutive barriers.
2. Upon arrival at a barrier, all nodes create final
NCS diffs.
3. All nodes except the barrier manager node
send their invalidation notices including each
dirty page diff size to the manager node.
4. Barrier manager receives a barrier arrival
notice including a dirty page list and the size of
each dirty page diff from every node.
15. Embedded and Parallel Systems Lab15
Dynamic Home Migration
5. Whenever the manager receives the notice, it
accumulates dirty pages, creates global dirty
pages, and sets a home node which has the
maximum diff size for each dirty page
6. Receiving the new home node list, all nodes
update home nodes by sending their diffs to
corresponding home.
Note That only the last lock owner updates the
home nodes with its integrated diffs made in
the lock synchronization if the last lock owner is
not the home of the CS diff.
16. Embedded and Parallel Systems Lab16
Performance Evaluation
Environment
TM : ThreadMarks which is a home less LRC
CHBLRC : conventional home-based LRC (eager, there is no diff
integration, static home)
LHB (or LHB ScC) : lazy home-based Scope consistency
Network has 32 nodes
100Mbit switched ethernet
350 MHz Pentium II CPU
192 MB of memory
Gentoo Linux with gcc3.3.2
17. Embedded and Parallel Systems Lab17
Performance Evaluation
Environment
PNN : parallel neural network application (lock & barrier)
Barnes-Hut : Barnes-Hut N-Body algorithm (barrier)
IS : Integer sort (barrier)
Water : simulates water molecular dynamic (lock & barrier)
SOR : Successive Over-Relaxation (barrier)