SlideShare ist ein Scribd-Unternehmen logo
1 von 48
D2MCE
Speaker :呂宗螢
Adviser: 梁文耀 老師
Date : 2008/07/14
Embedded and Parallel Systems Lab 2
D2MCE
Wireless
Network
Embedded and Parallel Systems Lab 3
DSM
Three State
Invalid Shared
Exclusive
writemiss
shares={node}
invalidate invalidate
read miss
sharers = shares + {node}
fetch
w
rite
hit
sharers
=
{node}
read hit / write hit
read hit
Black = all node process
Red = only home node
process
Embedded and Parallel Systems Lab 5
Invalidate & update
Node 1 Node2 Node 3 Node 4
store(A)
update
update
update
load(A)
Node 1 Node2 Node 3 Node 4
store(A)
invalidate
load(A)
invalidateinvalidate
update
Invalidate Update
Embedded and Parallel Systems Lab 6
Release Consistency Definition
1. Before an ordinary access is allowed to
perform with respect to any other processor,
all previous acquires must be performed.
2. Before a release is allowed to perform with
respect to any other processor, all previous
ordinary read and writes must be performed.
3. Special accesses are sequentially consistent
with respect to one another.
Embedded and Parallel Systems Lab 7
ERC & LRC
Lazy RC
Node 1 Node 2 Node 3
store(A)
store(A
)
release
acquire
store(A
)
release
acquire
release
acquire
Eager RC
Node 1 Node 2 Node 3
store(A)
release
store(A
)
release
acquire
store(A
)
release
acquire
acquire
Embedded and Parallel Systems Lab 8
Home-base & Homeless
 Homeless
 Diff scattered in all the nodes
 Diff store
 Garbage collection
 Home-base
 Centralize processing & always update
 No diff store
 No garbage collection
 Home node access the share memory no communication
Embedded and Parallel Systems Lab 9
HLRC
Node 1 Node 2 Home Node 3
store(A)
acquire
release
Load(A)
acquire
release
Invalidate(A)
twin
diff
apply
diff
fetch page
Only send not invalid node
Invalid
Node 1 Home Node2
Not invalid
Node 3
Invalid
Node 4
store(A
)
acquire
release
invalidate
acquire
release
req
update
acquire
release
req
update
load(A)
load(A)
reply
HERC Worst Case
4*W count
8*W byte
Node 1 Home Node 2 Node 3 Node 4
acquire
release
store(A)
A (exclusive)
A (invalid)
A (invalid) A (shared) A (invalid) A (invalid)
acquire
release
store(A)
A (invalid)
acquire
release
store(A)
A (exclusive)
A (exclusive)
A (invalid)
acquire
release
store(A)
A (invalid)
A (exclusive)
acquire
release
store(A)
A (exclusive)
A (invalid)
Invalidate
reply
Tradition ERC Worst Case
2(n-1) count
8*W byte
Node 1 Node 2 Node 3 Node 4
acquire
release
store(A)
A (invalid)
A (invalid) A (shared) A (invalid) A (invalid)
Invalidate
reply
acquire
release
store(A)
release
store(A)
acquire
release
acquire
store(A)
A (invalid)
A (invalid)
A (invalid)
A (exclusive)
A (exclusive)
A (exclusive)
A (exclusive)
HLRC Worst Case
1 count
3*4*n+8*sm byte
Node 1 Home Node 2 Node 3 Node 4
acquire
release
store(A)
A (invalid) A (shared) A (invalid) A (invalid)
acquire
release
store(A)
acquire
release
store(A)
acquire
release
store(A)
acquire
reply
A(invalid)
Invalidate(A)
Invalidate(A)
Invalidate(A)
HERC Best Case
Node 1 Home Node 2
acquire
release
store(A)
A (exclusive) A (invalid)
A (invalid) A (invalid)
4 count
8*W byte
Invalidate
reply
acquire
release
store(A)
acquire
release
store(A)
Node 3
A (exclusive)
Tradition ERC Best Case
2(n-1) count
8*W byte
Node 1 Node 2 Node 3 Node 4
acquire
release
store(A)
A (exclusive) A (invalid)
A (invalid) A (invalid) A (exclusive) A (invalid)
Invalidate
reply
acquire
release
store(A)
acquire
release
store(A)
HLRC Best Case
Node 0 Home Node1
A (invalid) A (exclusive)
release
store(A)
1 count
3*4*n+8*sm W byte
acquire
reply
acquire
release
store(A)
acquire
release
store(A)
acquire
Invalidate(A)
Embedded and Parallel Systems Lab 17
Application
D2CME Libraries
Join / Leave Share Memory Barrier Mutex Semaphore
Thread Manager
Communication
Sender Receiver
Resource
Manager
Share
Memory
Manager
Barrier
Manager
Mutex
Manager
TCP/IP
Based
Socket
Semaphore
Manager
…
D2MCE ArchitectureD2MCE
Processing framework
Node
Process
Communication
Receiver
Thread pool
Thread pool
request
request
Queue
Queue
Queue
Thread pool
assignment
Embedded and Parallel Systems Lab 19
Node
CommunicationProcess
Computing
Thread
(Application)
Resource
Share Memory
Barrier
Mutex
Semphore
Receiver
Sender
Node
Node
Node
……
Request
Reply
Communication
Thread pool process request
Node
CommunicationProcess
Share Memory thread 1
busying
Receiver
Sender
Share Memory thread 2
sleeping
Share Memory thread 3
busying
Share Memory thread 4
sleeping
request
request
Queue
request
request
request
Embedded and Parallel Systems Lab 21
Low
Memory Pool
HighMemory Address
64 1024 10240 Other Free
64
1024
10240
other
Embedded and Parallel Systems Lab 22
Memory Pool
struct memory_info{
size_t size;
};
表格 1 memory information structure
圖表 5 memory pool memory block
 mem_malloc
 mem_free
Embedded and Parallel Systems Lab 23
Thread safe
 All function thread safe
struct request_header{
unsigned short msg_type; // message type
unsigned int size; // package size
unsigned int src_node; // source node id
unsigned int src_index; // source index number
unsigned int des_index; // destination index number
};
Embedded and Parallel Systems Lab 24
CPU
Job
Core1 Core1
CPU
Core1 Core1
Two Level Parallel
Parallel
on
Cluster
Parallel
on
Multi-Core
or CPU
Multi-thread call d2mce function
Node 1
load(A)
thread2
Home node2
thread1
load(A)
store(A)
block
A(invalid)
A(shared
)
A’s state is shared
don’t send request
barrier
A(exclusive)
Embedded and Parallel Systems Lab 26
Node1 Access
Node2 Access
Node2
False Sharing
Node1
Page
Embedded and Parallel Systems Lab 27
Multiple-Writer Protocols
Embedded and Parallel Systems Lab 28
Embedded and Parallel Systems Lab 29
multiple-writer protocol
int d2mce_mload(void *share_memory, unsigned int offset,
unsigned int length);
int d2mce_mstore(void *share_memory, unsigned int offset,
unsigned int length);
表格 3 Multiple-write protocol function
圖表 8 Multiple-writer protocol
Embedded and Parallel Systems Lab 30
multiple-writer protocol
If(node_id == 0)
d2mce_store(SM); // SM = share memory
d2mce_barrier(&barrier, nodes); // nodes = number of nodes
d2mce_mload(SM, start*sizeof(TYPE), end*sizeof(TYPE));
表格 4 Scatter program
pattern
d2mce_mstore(SM, start*sizeof(TYPE), end*sizeof(TYPE));
d2mce_barrier(&barrier, nodes);
if(node_id ==0)
d2mce_load(SM)
表格 5 Gather program pattern
Embedded and Parallel Systems Lab 31
Dynamic manager migration
int d2mce_sethome(void *share_memory);
int d2mce_ibarrier_manager();
int d2mce_isem_manager();
int d2mce_imutex_manager();
int d2mce_iresource_manager();
manager migration
New manager
Node 0
Old manager
Node1 Node 2 Node 3
manage
information
I home
request
Init & set
manage
information
ok
new manager
lock & wait
service
forward
unlock &
forward
request
request
new manager
block
HRC broadcast
Node 1 Home Node 2 Node 3 Node 4
acquire
release
store(A)
acquire
release
load(A)
acquire
release
load(A)
release
load(A)
acquire
latency
Node 1 Home Node 2 Node 3 Node 4
acquire
release
store(A)
acquire
load(A)
acquire
load(A)
load(A)
acquire
HRC broadcast barrier
barrierlatency
Node 2 Home Node 3Node 1 Node 4 Node 5 Node 6
store(A)
update
update
update
register node
Home based Disseminate Update
load(A) load(A) load(A) load(A)
not
invalid
invalidate
invalid
Broadcast coding pattern
store node all need load node
Use mutex
d2mce_mutex_lock(&m1)
d2mce_store(A)
d2mce_mutex_unlock(&m1)
d2mce_mutex_lock(&m1)
d2mce_load(A)
d2mce_mutex_unlock(&m1)
Use barrier
d2mce_store(A)
d2mce_barrier(&b1, neednodes)
d2mce_barrier(&b1, neednodes)
d2mce_load(A)
Use semaphore
d2mce_store(A)
for(i=0; i<neednodes; i++)
d2mce_sem_post(&m1)
d2mce_sem_wait(&m1)
d2mce_load(A)
Home based Disseminate Update
int d2mce_update_register(void* share_memory);
int d2mce_update_unregister(void* share_memory);
Embedded and Parallel Systems Lab 38
Home based Disseminate Register
Node 1 Home Node 2
Register update
1
Input the table
Node 1 Home Node 2
Unregister update
Clear the node
Event driven
int d2mce_checkUpdate(void* share_memory);
Embedded and Parallel Systems Lab 40
Event driven (update)
Node 1
Node 2
store(A)
update
load(A) load(A)
ShareMemory
thread
Computing
thread
update A
checkupdate(A)
signal
Embedded and Parallel Systems Lab 41
Event driven (invalid)
Node 1
Node 2
store(A)
invalid
load(A)
load(A)
Share
Memory
thread
Computing
thread
invalid A
checkupdate(A)
signal
update
request
write and immediately load coding pattern
Store node Load node
Use mutex
d2mce_mutex_lock(&m1)
d2mce_store(A)
d2mce_mutex_unlock(&m1)
while(1){
d2mce_mutex_lock(&m1)
d2mce_load(A)
d2mce_mutex_unlock(&m1)
}
Use barrier
d2mce_store(A)
d2mce_barrier(&b1, neednodes)
while(1){
d2mce_barrier(&b1, neednodes)
d2mce_load(A)
}
Use semaphore
d2mce_store(A)
for(i=0; i<neednodes; i++)
d2mce_sem_post(&m1, neednodes)
while(1){
d2mce_sem_wait(&m1)
d2mce_load(A)
}
Use even driven
d2mce_store(A) while(1){
d2mce_checkUpdate(A)
d2mce_load(A)
}
Evaluation
MM
  1 2 4
128*128 0.0224598
0.0150916
[1.488231864]
0.0149468
[1.502649397]
256*256 0.1624132
0.09476025
[1.71393807]
0.07156825
[2.269347092]
512*512 1.3165244
0.6979126
[1.886374311]
0.438122
[3.004926482]
1024*10
24 38.787176 20.96464 [1.850123637]
10.51557
[3.688547173]
2048*20
48
362.681963
4
184.635501
[1.964313263]
91.1462238
[3.979122209]
Embedded and Parallel Systems Lab 44
Reference
1. Lamport, L. “How to make a correct multiprocess program execute
correctly on amultiprocessor.”, IEEE Transactions on Computers, On
page(s): 779-782, Jul 1997
2. K.Gharachorlook, D.Lenoski, J. Laudon, P.Gibbons, A.Gupta, and
J.Hennessy. ”Memory Consistency and Event Ordering in Scalable
Shared-Memory Multiprocessors.”, In Proceedings of the 17th Annual
Symposium on Computer Architecture, Pages 15-26, May 1990
3. L. Iftode, J.P. Singh and K. Li. “Scope Consistency: A Bridge between
Release Consistency and Entry Consistency.“, In Proc. of the 8th
Annual ACM Symposium on Parallel Algorithms and Architectures,
1996.
4. J.B. Carter, J.K. Bennett, and W. Zwaenepoel.”Implementation and
performance of Munin.” In Pro-ceedings of the 13th ACM Symposium
on Operating Systems Principles, pages 152-164, October 1991.
Embedded and Parallel Systems Lab 45
Reference
4. Keleher, P. Cox, A.L. Zwaenepoel, W. ”Lazy Release Consistency for
Software Distributed Shared Memory.” , In Computer Architecture, 1992.
Proceedings., The 19th Annual International Symposium, Pages 13-21, May
1992.
5. Y. Zhou, L. Iftode, and K. Li. ”Performance evaluation of two home-based
lazy release consistency protocols for shared virtual memory systems.”, In
Proceedings of the Second USENIX Symposium on Operating System Design
and Implementation, pages 75-88, November 1996.
6. Cox, A.L.; de Lara, E.; Hu, C.; Zwaenepoel, W. ”A performance comparison
of homeless and home-based lazy releaseconsistency protocols in software
shared memory.” , In High-Performance Computer Architecture, 1999.
Proceedings. Fifth International Symposium, page(s): 279-283, Jan 1999.
7. Byung-Hyun Yu, Zhiyi Huang, Stephen Cranefield, Martin Purvis. ”Homeless
and Home-based Lazy Release Consistency Protocols on Distributed
Shared.”, ACM International Conference Proceeding Series; Vol. 56
Proceedings of the 27th Australasian conference on Computer science - Volume
26, Pages:117-123, 2004 .
Embedded and Parallel Systems Lab 46
Reference
9. Pete Keleher, Alan L. Cox, Sandhya Dwarkadas, Willy Zwaenepoel,
“TreadMarks: distributed shared memory on standard
workstations and operating systems”, In Proceedings of the winter
USENIX Conference, pages:115-132, January 1994.
10. Cristiana Amza, Alan L. Cox, Sandhya Dwarkadas, Pete Keleher,
Honghui Lu, Ramakrishnan Rajamony, Weimin Yu, Willy
Zwaenepoel,”TreadMarks: shared memory computing on networks
of workstations.” , IEEE Computer 29(2), 18-28, 1996.
11. B. Cheung, C. Wang, and K. Hwang. ”A Migrating-Home Protocol
for Implementing Scope Consistency Model on a Cluster of
Workstations.” In International Conference on Parallel and Distributed
Processing Techniques and Applications, pages 821–827, 1999.
12. W. Hu, W. Shi, and Z. Tang. ”Home Migration in Home-based
Software DSMs.” In Proc. of the 1st Workshop on Software Distributed
Shared Memory (WSDSM’99), 1999.
Embedded and Parallel Systems Lab 47
Reference
13. W. Fang, C.-L. Wang, W. Zhu, and F. C. Lau. “A novel adaptive home
migration protocol in home-based DSM.” In Proc.of the 2004 IEEE
International Conference on Cluster Computing (Cluster2004), pages 215-224,
2004.
14. Sandhya Dwarkadas, Peter Keleher, Alan L. Cox, Willy Zwaenepoel,
“Evaluation of release consistent software distributed shared memory on
emerging network technology.” ACM SIGARCH Computer Architecture
News Volume 21 , Issue 2, Pages: 144 - 155 , May 1993
15. Weiwu Hu, Weisong Shi, Zhimin Tang, Zhiyu Zhou, “JIAJIA: An SVM
System Based on a New Cache Coherence Protocol (1998)”, Proc. of the
High-Performance Computing and Networking Europe 1999 (HPCN'99)
16. Wen-Yew Liang, Yu-Ming Hsieh and Zong-Ying Lyu, “Design of a Dynamic
Distributed Mobile Computing Environment,” in the Proceedings of the 13th
International Conference on Parallel and Distributed Systems (ICPADS 2007),
Dec. 5-7, 2007, Hsinchu, Taiwan, NSC: 96-2221-E-027-023. (EI)
Reference
17. Shun-Yun Hu, Guan-Ming Liao, “Scalable peer-to-peer networked
virtual environment”, Network and System Support for Games
Proceedings of 3rd ACM SIGCOMM workshop on Network and system
support for games, Pages: 129 – 133, Year of Publication: 2004
18. Matt Welsh, Steven D. Gribble, Eric A. Brewer, David Culler,”A
Design Framework for Highly Concurrent System”, EECS
Department University of California, Berkeley Technical Report No.
UCB/CSD-00-1108 2000.

Weitere ähnliche Inhalte

Andere mochten auch

tik icha smpit rpi
tik icha smpit rpi tik icha smpit rpi
tik icha smpit rpi
ichaa17
 

Andere mochten auch (12)

Acc
Acc Acc
Acc
 
Creative & Digital Business Briefing - October 2016
Creative & Digital Business Briefing - October 2016Creative & Digital Business Briefing - October 2016
Creative & Digital Business Briefing - October 2016
 
Everyone needs life insurance
Everyone needs life insuranceEveryone needs life insurance
Everyone needs life insurance
 
tik icha smpit rpi
tik icha smpit rpi tik icha smpit rpi
tik icha smpit rpi
 
Cs437 lecture 13
Cs437 lecture 13Cs437 lecture 13
Cs437 lecture 13
 
Forever Living Products… where ordinary people achieve extraordinary results
Forever Living Products… where ordinary people achieve extraordinary resultsForever Living Products… where ordinary people achieve extraordinary results
Forever Living Products… where ordinary people achieve extraordinary results
 
Programme on Ms Excel For Managerial Computing
Programme on Ms Excel For Managerial ComputingProgramme on Ms Excel For Managerial Computing
Programme on Ms Excel For Managerial Computing
 
How to do Spirometry in the Workplace
How to do Spirometry in the WorkplaceHow to do Spirometry in the Workplace
How to do Spirometry in the Workplace
 
Obesity
ObesityObesity
Obesity
 
Appul
AppulAppul
Appul
 
x town report
x town reportx town report
x town report
 
Epc slides part 2
Epc slides part 2Epc slides part 2
Epc slides part 2
 

Mehr von ZongYing Lyu

Mehr von ZongYing Lyu (16)

Vue.js
Vue.jsVue.js
Vue.js
 
Architecture of the oasis mobile shared virtual memory system
Architecture of the oasis mobile shared virtual memory systemArchitecture of the oasis mobile shared virtual memory system
Architecture of the oasis mobile shared virtual memory system
 
A deep dive into energy efficient multi core processor
A deep dive into energy efficient multi core processorA deep dive into energy efficient multi core processor
A deep dive into energy efficient multi core processor
 
Libckpt transparent checkpointing under unix
Libckpt transparent checkpointing under unixLibckpt transparent checkpointing under unix
Libckpt transparent checkpointing under unix
 
Device Driver - Chapter 6字元驅動程式的進階作業
Device Driver - Chapter 6字元驅動程式的進階作業Device Driver - Chapter 6字元驅動程式的進階作業
Device Driver - Chapter 6字元驅動程式的進階作業
 
Device Driver - Chapter 3字元驅動程式
Device Driver - Chapter 3字元驅動程式Device Driver - Chapter 3字元驅動程式
Device Driver - Chapter 3字元驅動程式
 
Web coding principle
Web coding principleWeb coding principle
Web coding principle
 
提高 Code 品質心得
提高 Code 品質心得提高 Code 品質心得
提高 Code 品質心得
 
SCRUM
SCRUMSCRUM
SCRUM
 
Consistency protocols
Consistency protocolsConsistency protocols
Consistency protocols
 
Compiler optimization
Compiler optimizationCompiler optimization
Compiler optimization
 
MPI use c language
MPI use c languageMPI use c language
MPI use c language
 
Cvs
CvsCvs
Cvs
 
Parallel program design
Parallel program designParallel program design
Parallel program design
 
MPI
MPIMPI
MPI
 
OpenMP
OpenMPOpenMP
OpenMP
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 

D2MCE

  • 2. Embedded and Parallel Systems Lab 2 D2MCE Wireless Network
  • 3. Embedded and Parallel Systems Lab 3 DSM
  • 4. Three State Invalid Shared Exclusive writemiss shares={node} invalidate invalidate read miss sharers = shares + {node} fetch w rite hit sharers = {node} read hit / write hit read hit Black = all node process Red = only home node process
  • 5. Embedded and Parallel Systems Lab 5 Invalidate & update Node 1 Node2 Node 3 Node 4 store(A) update update update load(A) Node 1 Node2 Node 3 Node 4 store(A) invalidate load(A) invalidateinvalidate update Invalidate Update
  • 6. Embedded and Parallel Systems Lab 6 Release Consistency Definition 1. Before an ordinary access is allowed to perform with respect to any other processor, all previous acquires must be performed. 2. Before a release is allowed to perform with respect to any other processor, all previous ordinary read and writes must be performed. 3. Special accesses are sequentially consistent with respect to one another.
  • 7. Embedded and Parallel Systems Lab 7 ERC & LRC Lazy RC Node 1 Node 2 Node 3 store(A) store(A ) release acquire store(A ) release acquire release acquire Eager RC Node 1 Node 2 Node 3 store(A) release store(A ) release acquire store(A ) release acquire acquire
  • 8. Embedded and Parallel Systems Lab 8 Home-base & Homeless  Homeless  Diff scattered in all the nodes  Diff store  Garbage collection  Home-base  Centralize processing & always update  No diff store  No garbage collection  Home node access the share memory no communication
  • 9. Embedded and Parallel Systems Lab 9 HLRC Node 1 Node 2 Home Node 3 store(A) acquire release Load(A) acquire release Invalidate(A) twin diff apply diff fetch page
  • 10. Only send not invalid node Invalid Node 1 Home Node2 Not invalid Node 3 Invalid Node 4 store(A ) acquire release invalidate acquire release req update acquire release req update load(A) load(A) reply
  • 11. HERC Worst Case 4*W count 8*W byte Node 1 Home Node 2 Node 3 Node 4 acquire release store(A) A (exclusive) A (invalid) A (invalid) A (shared) A (invalid) A (invalid) acquire release store(A) A (invalid) acquire release store(A) A (exclusive) A (exclusive) A (invalid) acquire release store(A) A (invalid) A (exclusive) acquire release store(A) A (exclusive) A (invalid) Invalidate reply
  • 12. Tradition ERC Worst Case 2(n-1) count 8*W byte Node 1 Node 2 Node 3 Node 4 acquire release store(A) A (invalid) A (invalid) A (shared) A (invalid) A (invalid) Invalidate reply acquire release store(A) release store(A) acquire release acquire store(A) A (invalid) A (invalid) A (invalid) A (exclusive) A (exclusive) A (exclusive) A (exclusive)
  • 13. HLRC Worst Case 1 count 3*4*n+8*sm byte Node 1 Home Node 2 Node 3 Node 4 acquire release store(A) A (invalid) A (shared) A (invalid) A (invalid) acquire release store(A) acquire release store(A) acquire release store(A) acquire reply A(invalid) Invalidate(A) Invalidate(A) Invalidate(A)
  • 14. HERC Best Case Node 1 Home Node 2 acquire release store(A) A (exclusive) A (invalid) A (invalid) A (invalid) 4 count 8*W byte Invalidate reply acquire release store(A) acquire release store(A) Node 3 A (exclusive)
  • 15. Tradition ERC Best Case 2(n-1) count 8*W byte Node 1 Node 2 Node 3 Node 4 acquire release store(A) A (exclusive) A (invalid) A (invalid) A (invalid) A (exclusive) A (invalid) Invalidate reply acquire release store(A) acquire release store(A)
  • 16. HLRC Best Case Node 0 Home Node1 A (invalid) A (exclusive) release store(A) 1 count 3*4*n+8*sm W byte acquire reply acquire release store(A) acquire release store(A) acquire Invalidate(A)
  • 17. Embedded and Parallel Systems Lab 17 Application D2CME Libraries Join / Leave Share Memory Barrier Mutex Semaphore Thread Manager Communication Sender Receiver Resource Manager Share Memory Manager Barrier Manager Mutex Manager TCP/IP Based Socket Semaphore Manager … D2MCE ArchitectureD2MCE
  • 18. Processing framework Node Process Communication Receiver Thread pool Thread pool request request Queue Queue Queue Thread pool assignment
  • 19. Embedded and Parallel Systems Lab 19 Node CommunicationProcess Computing Thread (Application) Resource Share Memory Barrier Mutex Semphore Receiver Sender Node Node Node …… Request Reply Communication
  • 20. Thread pool process request Node CommunicationProcess Share Memory thread 1 busying Receiver Sender Share Memory thread 2 sleeping Share Memory thread 3 busying Share Memory thread 4 sleeping request request Queue request request request
  • 21. Embedded and Parallel Systems Lab 21 Low Memory Pool HighMemory Address 64 1024 10240 Other Free 64 1024 10240 other
  • 22. Embedded and Parallel Systems Lab 22 Memory Pool struct memory_info{ size_t size; }; 表格 1 memory information structure 圖表 5 memory pool memory block  mem_malloc  mem_free
  • 23. Embedded and Parallel Systems Lab 23 Thread safe  All function thread safe struct request_header{ unsigned short msg_type; // message type unsigned int size; // package size unsigned int src_node; // source node id unsigned int src_index; // source index number unsigned int des_index; // destination index number };
  • 24. Embedded and Parallel Systems Lab 24 CPU Job Core1 Core1 CPU Core1 Core1 Two Level Parallel Parallel on Cluster Parallel on Multi-Core or CPU
  • 25. Multi-thread call d2mce function Node 1 load(A) thread2 Home node2 thread1 load(A) store(A) block A(invalid) A(shared ) A’s state is shared don’t send request barrier A(exclusive)
  • 26. Embedded and Parallel Systems Lab 26 Node1 Access Node2 Access Node2 False Sharing Node1 Page
  • 27. Embedded and Parallel Systems Lab 27 Multiple-Writer Protocols
  • 28. Embedded and Parallel Systems Lab 28
  • 29. Embedded and Parallel Systems Lab 29 multiple-writer protocol int d2mce_mload(void *share_memory, unsigned int offset, unsigned int length); int d2mce_mstore(void *share_memory, unsigned int offset, unsigned int length); 表格 3 Multiple-write protocol function 圖表 8 Multiple-writer protocol
  • 30. Embedded and Parallel Systems Lab 30 multiple-writer protocol If(node_id == 0) d2mce_store(SM); // SM = share memory d2mce_barrier(&barrier, nodes); // nodes = number of nodes d2mce_mload(SM, start*sizeof(TYPE), end*sizeof(TYPE)); 表格 4 Scatter program pattern d2mce_mstore(SM, start*sizeof(TYPE), end*sizeof(TYPE)); d2mce_barrier(&barrier, nodes); if(node_id ==0) d2mce_load(SM) 表格 5 Gather program pattern
  • 31. Embedded and Parallel Systems Lab 31 Dynamic manager migration int d2mce_sethome(void *share_memory); int d2mce_ibarrier_manager(); int d2mce_isem_manager(); int d2mce_imutex_manager(); int d2mce_iresource_manager();
  • 32. manager migration New manager Node 0 Old manager Node1 Node 2 Node 3 manage information I home request Init & set manage information ok new manager lock & wait service forward unlock & forward request request new manager block
  • 33. HRC broadcast Node 1 Home Node 2 Node 3 Node 4 acquire release store(A) acquire release load(A) acquire release load(A) release load(A) acquire latency
  • 34. Node 1 Home Node 2 Node 3 Node 4 acquire release store(A) acquire load(A) acquire load(A) load(A) acquire HRC broadcast barrier barrierlatency
  • 35. Node 2 Home Node 3Node 1 Node 4 Node 5 Node 6 store(A) update update update register node Home based Disseminate Update load(A) load(A) load(A) load(A) not invalid invalidate invalid
  • 36. Broadcast coding pattern store node all need load node Use mutex d2mce_mutex_lock(&m1) d2mce_store(A) d2mce_mutex_unlock(&m1) d2mce_mutex_lock(&m1) d2mce_load(A) d2mce_mutex_unlock(&m1) Use barrier d2mce_store(A) d2mce_barrier(&b1, neednodes) d2mce_barrier(&b1, neednodes) d2mce_load(A) Use semaphore d2mce_store(A) for(i=0; i<neednodes; i++) d2mce_sem_post(&m1) d2mce_sem_wait(&m1) d2mce_load(A)
  • 37. Home based Disseminate Update int d2mce_update_register(void* share_memory); int d2mce_update_unregister(void* share_memory);
  • 38. Embedded and Parallel Systems Lab 38 Home based Disseminate Register Node 1 Home Node 2 Register update 1 Input the table Node 1 Home Node 2 Unregister update Clear the node
  • 40. Embedded and Parallel Systems Lab 40 Event driven (update) Node 1 Node 2 store(A) update load(A) load(A) ShareMemory thread Computing thread update A checkupdate(A) signal
  • 41. Embedded and Parallel Systems Lab 41 Event driven (invalid) Node 1 Node 2 store(A) invalid load(A) load(A) Share Memory thread Computing thread invalid A checkupdate(A) signal update request
  • 42. write and immediately load coding pattern Store node Load node Use mutex d2mce_mutex_lock(&m1) d2mce_store(A) d2mce_mutex_unlock(&m1) while(1){ d2mce_mutex_lock(&m1) d2mce_load(A) d2mce_mutex_unlock(&m1) } Use barrier d2mce_store(A) d2mce_barrier(&b1, neednodes) while(1){ d2mce_barrier(&b1, neednodes) d2mce_load(A) } Use semaphore d2mce_store(A) for(i=0; i<neednodes; i++) d2mce_sem_post(&m1, neednodes) while(1){ d2mce_sem_wait(&m1) d2mce_load(A) } Use even driven d2mce_store(A) while(1){ d2mce_checkUpdate(A) d2mce_load(A) }
  • 43. Evaluation MM   1 2 4 128*128 0.0224598 0.0150916 [1.488231864] 0.0149468 [1.502649397] 256*256 0.1624132 0.09476025 [1.71393807] 0.07156825 [2.269347092] 512*512 1.3165244 0.6979126 [1.886374311] 0.438122 [3.004926482] 1024*10 24 38.787176 20.96464 [1.850123637] 10.51557 [3.688547173] 2048*20 48 362.681963 4 184.635501 [1.964313263] 91.1462238 [3.979122209]
  • 44. Embedded and Parallel Systems Lab 44 Reference 1. Lamport, L. “How to make a correct multiprocess program execute correctly on amultiprocessor.”, IEEE Transactions on Computers, On page(s): 779-782, Jul 1997 2. K.Gharachorlook, D.Lenoski, J. Laudon, P.Gibbons, A.Gupta, and J.Hennessy. ”Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors.”, In Proceedings of the 17th Annual Symposium on Computer Architecture, Pages 15-26, May 1990 3. L. Iftode, J.P. Singh and K. Li. “Scope Consistency: A Bridge between Release Consistency and Entry Consistency.“, In Proc. of the 8th Annual ACM Symposium on Parallel Algorithms and Architectures, 1996. 4. J.B. Carter, J.K. Bennett, and W. Zwaenepoel.”Implementation and performance of Munin.” In Pro-ceedings of the 13th ACM Symposium on Operating Systems Principles, pages 152-164, October 1991.
  • 45. Embedded and Parallel Systems Lab 45 Reference 4. Keleher, P. Cox, A.L. Zwaenepoel, W. ”Lazy Release Consistency for Software Distributed Shared Memory.” , In Computer Architecture, 1992. Proceedings., The 19th Annual International Symposium, Pages 13-21, May 1992. 5. Y. Zhou, L. Iftode, and K. Li. ”Performance evaluation of two home-based lazy release consistency protocols for shared virtual memory systems.”, In Proceedings of the Second USENIX Symposium on Operating System Design and Implementation, pages 75-88, November 1996. 6. Cox, A.L.; de Lara, E.; Hu, C.; Zwaenepoel, W. ”A performance comparison of homeless and home-based lazy releaseconsistency protocols in software shared memory.” , In High-Performance Computer Architecture, 1999. Proceedings. Fifth International Symposium, page(s): 279-283, Jan 1999. 7. Byung-Hyun Yu, Zhiyi Huang, Stephen Cranefield, Martin Purvis. ”Homeless and Home-based Lazy Release Consistency Protocols on Distributed Shared.”, ACM International Conference Proceeding Series; Vol. 56 Proceedings of the 27th Australasian conference on Computer science - Volume 26, Pages:117-123, 2004 .
  • 46. Embedded and Parallel Systems Lab 46 Reference 9. Pete Keleher, Alan L. Cox, Sandhya Dwarkadas, Willy Zwaenepoel, “TreadMarks: distributed shared memory on standard workstations and operating systems”, In Proceedings of the winter USENIX Conference, pages:115-132, January 1994. 10. Cristiana Amza, Alan L. Cox, Sandhya Dwarkadas, Pete Keleher, Honghui Lu, Ramakrishnan Rajamony, Weimin Yu, Willy Zwaenepoel,”TreadMarks: shared memory computing on networks of workstations.” , IEEE Computer 29(2), 18-28, 1996. 11. B. Cheung, C. Wang, and K. Hwang. ”A Migrating-Home Protocol for Implementing Scope Consistency Model on a Cluster of Workstations.” In International Conference on Parallel and Distributed Processing Techniques and Applications, pages 821–827, 1999. 12. W. Hu, W. Shi, and Z. Tang. ”Home Migration in Home-based Software DSMs.” In Proc. of the 1st Workshop on Software Distributed Shared Memory (WSDSM’99), 1999.
  • 47. Embedded and Parallel Systems Lab 47 Reference 13. W. Fang, C.-L. Wang, W. Zhu, and F. C. Lau. “A novel adaptive home migration protocol in home-based DSM.” In Proc.of the 2004 IEEE International Conference on Cluster Computing (Cluster2004), pages 215-224, 2004. 14. Sandhya Dwarkadas, Peter Keleher, Alan L. Cox, Willy Zwaenepoel, “Evaluation of release consistent software distributed shared memory on emerging network technology.” ACM SIGARCH Computer Architecture News Volume 21 , Issue 2, Pages: 144 - 155 , May 1993 15. Weiwu Hu, Weisong Shi, Zhimin Tang, Zhiyu Zhou, “JIAJIA: An SVM System Based on a New Cache Coherence Protocol (1998)”, Proc. of the High-Performance Computing and Networking Europe 1999 (HPCN'99) 16. Wen-Yew Liang, Yu-Ming Hsieh and Zong-Ying Lyu, “Design of a Dynamic Distributed Mobile Computing Environment,” in the Proceedings of the 13th International Conference on Parallel and Distributed Systems (ICPADS 2007), Dec. 5-7, 2007, Hsinchu, Taiwan, NSC: 96-2221-E-027-023. (EI)
  • 48. Reference 17. Shun-Yun Hu, Guan-Ming Liao, “Scalable peer-to-peer networked virtual environment”, Network and System Support for Games Proceedings of 3rd ACM SIGCOMM workshop on Network and system support for games, Pages: 129 – 133, Year of Publication: 2004 18. Matt Welsh, Steven D. Gribble, Eric A. Brewer, David Culler,”A Design Framework for Highly Concurrent System”, EECS Department University of California, Berkeley Technical Report No. UCB/CSD-00-1108 2000.