分散式系統

優點
 Resource Sharing
 不同地區的Process連通時→USER A可使用USER B的資源
 Computation Speedup
 困難複雜的問題分派多個處理器綜合處理
 Reliability
 因各處理器有各自獨立的Memory→當有一個處理器受損
時，將不致影響其他處理器之作業;同時互相幫忙修補
 Communication
 任何連通的USER皆可藉由網路互相通訊和諮詢

作業系統的類型
 資料傳輸
 Site A ─Data→ Site B
 資料可視需求而定，但格式需一致，避免遺失資料
 計算傳輸
 使用者將指令藉由網路傳送至遠端處理器
 由遠端處理器以Local Resources執行
 再將執行結果回傳予使用者
 行程傳輸
 將Process藉由網路傳送至遠端執行，用此執行的理由：
 Load Balancing
 Computation Speedup
 Hardware / Software Preference
 Data Access

C Socket for Windows
 Server.c
#include<winsock2.h>
#include<stdio.h>
int main() {
SOCKET server_sockfd, client_sockfd;
int server_len, client_len;
struct sockaddr_in server_address , sockaddr_in client_address;
// 註冊 Winsock DLL
WSADATA wsadata;
WSAStartup(0x101,(LPWSADATA)&wsadata)
// 產生 server socket
server_sockfd = socket(AF_INET, SOCK_STREAM, 0);
// AF_INET(使用IPv4); SOCK_STREAM; 0(即TCP)

 Server.c

server_address.sin_family = AF_INET;
server_address.sin_addr.s_addr = inet_addr("127.0.0.1");
server_address.sin_port = 1234;
server_len = sizeof(server_address);

bind(server_sockfd, (struct sockaddr *)
&server_address, server_len);

listen(server_sockfd, 5); // 5(即佇列數)

 Server.c
while(1) {
char ch;
printf("Server waiting...n");
client_len = sizeof(client_address);
client_sockfd = accept(server_sockfd, (struct sockaddr *)
&client_address, &client_len);
recv(client_sockfd, &ch, 1, 0); // 接收‟A‟
ch++; // „A‟→‟B‟
send(client_sockfd, &ch, 1, 0); // 傳送‟B‟
closesocket(client_sockfd);
WSACleanup();
}
}

 Client.c

#include<winsock2.h>
#include<stdio.h>
int main() {
SOCKET sockfd;
int len , result;
struct sockaddr_in address;
char ch = 'A';
WSADATA wsadata;
WSAStartup(0x202,(LPWSADATA)&wsadata);
sockfd = socket(AF_INET, SOCK_STREAM, 0);
address.sin_family = AF_INET;

 Client.c

address.sin_addr.s_addr = inet_addr("127.0.0.1");
address.sin_port = 1234;
len = sizeof(address);
connect(sockfd, (struct sockaddr *)&address, len);
send(sockfd, &ch, 1, 0);
recv(sockfd, &ch, 1, 0);
printf("char from server = %cn", ch);
closesocket(sockfd);
WSACleanup();
system("pause");
}

Client and server with threads

Thread 2 makes
requests to server
Input-output
Receipt &
Thread 1 queuing
generates
results T1
Requests
N threads
Client
Server

Distributed Systems: Concepts and Design

Alternative server threading architectures

workers per-connection threads per-object threads

I/O remote I/O remote
remote
objects
objects objects

a. Thread-per-request b. Thread-per-connection c. Thread-per-object

Distributed Systems: Concepts and Design

C Thread

-lpthreadGC2

C Thread
 pthread.c

#include <stdio.h>
#include <pthread.h>
void *thread_func(void *arg);
char message[] = "Hello World";
int main() {
pthread_t thread;
void *thread_result;
pthread_create(&thread,NULL,thread_func,(void *)message);
printf("Waiting for thread to finish...n");

C Thread
 pthread.c

pthread_join(thread,&thread_result);
printf("Thread joined, it returned %sn",(char *)thread_result);
system("pause");
}
void *thread_func(void *arg) {
printf("thread %s is runningn",(char *)arg);
sleep(3);
pthread_exit("Thange you use CPU Timen");
}

Java TCP Socket (per-connection threads)
 Client.java

String data = in.readUTF();
import java.net.*;
System.out.println("Received: "+ data) ;
import java.io.*;
s.close();
public class Client {
}catch (IOException e){
public static void main (String args[]) {
System.out.println(e.getMessage());
Socket s = null;
}finally {
try{
if(s!=null)
int serverPort = 1234;
try {s.close();}
s = new Socket("localhost", serverPort);
catch (IOException e){}
DataInputStream in = new
DataInputStream( s.getInputStream()); }
DataOutputStream out = new }
DataOutputStream( s.getOutputStream()); }
out.writeUTF(“Hello");

 Server.java

import java.net.*;
import java.io.*;
public class Server {
public static void main(String args[]) {
try{
int serverPort = 1234;
ServerSocket listenSocket = new ServerSocket(serverPort);
while(true) {
Socket clientSocket = listenSocket.accept();
Connection c = new Connection(clientSocket);
}
} catch(IOException e) {
System.out.println(e.getMessage());
}
}
}

 Connection.java
this.start();
} catch(IOException e){
import java.net.*; System.out.println(e.getMessage());}
import java.io.*; }
class Connection extends Thread { public void run(){
DataInputStream in; try {
DataOutputStream out; String data = in.readUTF();
Socket clientSocket; out.writeUTF("client data is " + data);
public Connection (Socket ClientSocket) { } catch(IOException e) {
try { System.out.println(e.getMessage());
clientSocket = ClientSocket; } finally {
in = new try {
DataInputStream( clientSocket.getInputStream());
clientSocket.close();
out = new
} catch (IOException e) {}
DataOutputStream( clientSocket.getOutputStream());
}
}
}

時間同步的類型
 External
 Synchronize all clocks against a single one, usually
the one with external, accurate time information
 Internal
 Synchronize all clocks among themselves

 At least time monotonicity must be preserved

 External (accuracy) :
同步於驗證來源的時間
 Each system clock Ci S
differs at most Dext at
every point in the
synchronization interval
from an external UTC
source S:
|S - Ci| < Dext for all i C1 C3

C2

 Internal
(agreement) :
彼此間合力同步時間
 Any two system clocks C1 C3
Ci and Cj differs at
most Dint at every point C2
in the synchronization
interval from each
other:
| Cj - Ci| < Dint
for all i and j

 Dext and Dint are synchronization bounds
 Dint <= 2Dext
 Max-Synch-interval = Dint / 2Dext
 It means:
 If two events have single-value timestamps which
differ by less than some value，we CAN‟T SAY in
which order the events occurred.
 With interval timestamps, when intervals overlap, we
CAN‟T SAY in which order the events occurred.

同步系統時間
TB
B B‟s clock time
TA TA+Ttrans
A A‟s clock time
Ttrans
real time
Tmin < Ttrans < Tmax
Ttrans= (Tmin+ Tmax)/2 is at most wrong by (Tmin- Tmax)/2
If A sends its clock time TA to B
→ B can set its clock to TA + (Tmin+ Tmax)/2
→ then A and B are synchronized with bound (Tmin- Tmax)/2
Tmin (Tmin+ Tmax)/2 Tmax

Ttrans

(Tmin- Tmax)/2(Tmin- Tmax)/2

非同步系統時間
TB TB +Tround/2
B B‟s clock time
TA TA+Ttrans T‟A
A A‟s clock time

Tround

 In asynchronous system, we have no Tmax
 How can A synchronize with B?
 By using the round-trip time Tround=TA-T‟A in Cristian‟s algorithm:
TB= TB+ Tround/2

JAVA RMI (External Clock Synchronize)

 Clock.java
import java.rmi.*;
public interface Clock extends Remote{
String getTime() throws RemoteException;
}
 ClockImpl.java
import java.rmi.*;
import java.rmi.server.*;
import java.util.*;
public class ClockImpl extends UnicastRemoteObject implements Clock {
public ClockImpl() throws RemoteException {
super();
}
public String getTime() {
Date d = new Date();
return d.toString();
}
}

 ClockServer.java

import java.rmi.*;
public class ClockServer {
public ClockServer() {
try {
Clock c = new ClockImpl();
Naming.rebind("//localhost/ClockService",c);
} catch (Exception e) {
System.out.print(e.getMessage());
}
}
new ClockServer();
}
}

 ClockClient.java

import java.rmi.*;
import java.net.*;
public class ClockClient {
try {
Clock c = (Clock)Naming.lookup("//localhost/ClockService");
System.out.println(c.getTime());
} catch (Exception e) {
System.out.print(e.getMessage());
}
}
}

Logical time
 One aspect of clock synchronization is to provide a mechanism
whereby systems can assign sequence numbers (“timestamps”) to
messages upon which all cooperating processes can agree.
 Leslie Lamport (1978) showed that clock synchronization need
not be absolute and L. Lamport„s two important points lead to
“causality”
 First point:
 If two processes do not interact, it is not necessary that their
clocks be synchronized
 they can operate concurrently without fear of interferring with each
other
 Second (critical) point:
 It is not important that all processes agree on time, but
rather, that they agree on the order in which events occur
 Such “clocks” are referred to as Logical Clocks
 Logical time is based on happens-before relationship

事件序列 Event Ordering
 Happens before and concurrent events illustrated

No causal path neither
from e1 to e2 nor from e2 to e1
e1 and e2 are concurrent



Types of events
Send
Receive
Internal (change of state)

協調 Co-ordination
 對於分散式系統的困難點
 Centralised solutions not appropriate
 communications bottleneck
 Fixed master-slave arrangements not appropriate
 process crashes
 Varying network topologies
 ring, tree, arbitrary; connectivity problems
 Failures must be tolerated if possible
 link failures
 process crashes
 Impossibility results
 in presence of failures, esp asynchronous model

Mutual Exclusion
 要求
 Safety
 At most one process may execute in CS at any time
 Liveness
 Every request to enter and exit a CS is eventually granted
 Ordering (desirable)
 Requests to enter are granted according to causality order (FIFO)

Synchronization
Centralized Distributed
scheme
Based on mutual Central Circulating
exclusion process token

No mutual Physical Clock Physical clocks
exclusion Event Count Logical clocks

Mutual Exclusion
 執行分三大類
 Centralized Approach
 P1有意進入Critical Section時→傳遞一個意願訊息Request→C接受意願訊息Request →
若Critical Section允許Process進入→傳遞一個允許訊息Reply→P1就能進入
 此時當P2也有意願進行Critical Section →C將P2之意願訊息置入至Waiting Queue
 當P1離開臨界區時→傳遞一個釋出訊息Release至C→C將傳遞一個允許訊息Reply至Waiting
Queue中的下一個意訊願訊息的擁有者Process
 Distributed Approach
 比較Timestamp
 要知道網路上所有Node的Name及也要將本身的Name告知其它節點，降低增加節點的頻率
 當Node故障，系統應立刻通知其它Node且進行修復後，故應經常維護各Node正常運作
 Process未進入Critical Section，必會頻頻停頓等待其他Process之操作
 Token Passing Approach
 適當的路徑，避免Node發生Starvation
 若Token遺失，系統應重新設定一個Token補救
 若路徑有Node故障，系統應重組最佳新路徑

Two-Phase Commit Protocol


prepare(T) <prepare T>

ready(T) abort(T)
<ready T> <no T>

Two-Phase Commit Protocol


commit(T) abort(T)
<commit T> <abort T>

acknowledge(T) acknowledge(T)

<complete T>

Deadlock Prevention and Avoidance
 資源編碼演算法Resources Ordering Algorithm
 將網路上所有的資料源依我們想像的工作進行Global Resources-
ordering ，並給予唯一的編號
 當某Process當時正佔有資源i時，不得再對於小於i的資源提出要求，如此
可降低循環等待的機會
 Simple to implement; requires little overhead
 銀行家演算法Banker‟s Algorithm
 分散式系統選出一個最適當的Process擔任銀行家Banker，管理網路上所有
的資源及對商上各Process作最適當的資源分配

 (New)時間戳記優先演算法Timestamp Priority Algorithm
 網路上所有Process的TS均設定為各Process之Priority Number
 TS愈小的Process其優先等級愈高(愈早發生)
 唯有優先等級較高的Process，可以向優先等級低的提出資源要求

Timestamp Priority Algorithm


TR=5 TR=10

TR=10 TR=15

Deadlock Detection

區域等待圖Local Wait For Graph 全域等待圖Global Wait For Graph

 集中式執行Centralized Approach
 分散式執行Distributed Approach

複雜度測量
 Computational Rounds
 同步將以計時器度量回合數
 非同步演算法將以透過網路散播事件的次數waves來決
定回合數
 Local Running Time
 Spaced
 Global→所有電腦使用空間的總和
 Local→每台電腦需要使用多少空間
 Message complexity
 電腦傳送的總訊息數
 訊息M透過p個邊傳輸→訊息複雜度為p|M|，|M|代表M的長度

基本分散式演算法
 Ring Leader
 Tree Leader
 BFS
 MST

Ring Leader
 每Process將它的id傳送到環狀裡的下一個Process
之後的回合裡，每個Process將執行如下的計算：
 從上一個Process收到一個識別號碼id
 將id與自己的識別號碼比較
 把兩值之中的最小值，傳送到環狀裡的下一個Process

Algorithm
RingLeader(id):
Input:The unique identifier, id, for the processor running
Output:The smallest identifier of a processor in the ring
M←[Candidate is id]
Send message M to the successor processor in the ring
done←false
repeat
Get message M from the predecessor processor in the ring.
if M=[Candidate is i] then
if i=id then
M←[Leader is id]
done←true

Algorithm
else
m←min{i,id}
M←[Candidate is m]
else
{M is a “Leader is” message}
done←true
Send message M to the next processor in the ring
until done
return M

Analysis
 O(2N)
 O(N)
 Local Spaced
 O(1)
 Message Complexity
 O(N2)

Tree Leader
 假設網路是一個自由樹狀圖
 自然起始點
 外部節點
 非同步
 訊息檢查Message Check
 特定邊是否已送出訊息且到達該節點
 二階段
 Accumulation Phase
 id自樹的外部節點流入，記錄最小id的節點
 找出Leader
 Broadcast Phase
 廣播Leader id至各外部節點

Algorithm
TreeLeader(id):
Input:The unique identifier, id, for the processor running
Output:The smallest identifier of a processor in the ring
{Accumulation Phase}
Let d be the number of neighbors of processor id
m ←0 {counter for messages received}
ℓ ←id {tentative leader}
repeat
{begin a new round}
for each neighbor j do
check if a message from processor j has arrived
if a message M = [Candidate is i] from j has arrived then
ℓ←min{i. ℓ}
m←m＋1

Algorithm
until m > d-1
if m=d then
M←[Leader is ℓ]
for each neighbor i≠k do
send message M to processor j
return M {M is a “leader is ” message}
else
M←[Candidate is ℓ]
send M to the neighbor k that has not sent a message yet

Algorithm
{Broadcast Phase}
repeat
{begin a new round}
check if a message from processor k has arrived
if a message M from k has arrived then
m←m+1
if M=[Candidate is i] then
ℓ←min{i,ℓ}
M←[Leader is ℓ]
for each neighbor j do
send message M to process j

Algorithm
else
{M is a “leader is” message}
for each neighbor j≠k do
send message M to processor j
until m=d
return M {M is a “leader is” message}

Analysis
• di為處理器i的相鄰Process之數量
 O(D)
 O(diD)
 Local Spaced
 O(di)
 Message Complexity
 O(N)

Tree Leader

 同步
 一塊石頭被丟池塘內後引起的漣漪
 直徑Diameter為圖中任兩個節點之間最長之路徑之長度
 回合數為Diameter
 二階段
 Accumulation Phase：中心
 Broadcast Phase：向外傳播

Breadth-first Search
 認定s為source node
 同步
 以波wave的型態向外散播
 一層層由上往下建構BFS Tree
 每部節點v傳送訊息給先前沒有與v有所接觸的鄰居
 任一節點v必須選擇另一個節點v當父節點

Algorithm
SynchronousBFS(v,s):
Input: The identifier v of the node (processor) executing this algorithm and
the identifier s of the start node of the BFS traversal
Output: For each node v, its parent in a BFS tree rooted at s
repeat
{begin a new round}
if v=s or v has received a message from one of its neighbors then
set parent(v) to be a node requesting v to become its child
(or null, if v=s)
for each node w adjacent to v that has not contacted v yet do
send a message to w asking w to become a child of v
until v=s or v has received a message

Analysis
 n個節點，m個邊
 Local Spaced
 O(n+m)

Breadth-first Search
 非同步
 要求每個處理器知道在網路中的Process總數
 根節點s送出的一個「脈衝」訊息，來觸發其他Process
開始進行整體計算的下一回合
 合併
 向下脈衝從根節點s傳遞至BFS Tree
 向上脈衝從BFS Tree的外部節點一直到根節點s
 先收到向上脈衝信號之後，
才會發出一個新的向下脈衝信號

Algorithm
AsynchronousBFS(v,s):
Input: The identifier v of the node (processor) executing this
algorithm and the identifier s of the start node of the BFS
traversal
Output: For each node v, its parent in a BFS tree rooted at s
C←ø {verified BFS children for v}
set A to be the set of neighbors of v
repeat
{begin a new round}
if parent(v) is defined or v=s then
if parent(v) is defined then
wait for pulse-down message from parent(v)

Algorithm
if C is not empty then
{v is an internal node in the BFS tree}
send a pulse-down message to all nodes in C
wait for a pulse-up message from all nodes in C
else
{v is an external node in the BFS tree}
for each node u in A do
send a make child message to u

Algorithm

for each node u in A do
get a message M from u and remove u from A
if M is an accept-child message then
add u to C
send a pulse-up message to parent(v)
else
{v ≠s has no parent yet}
for each node w in A do
if w has sent v a make-child message then
remove w from A
{w is no longer a candidate child for v}

Algorithm

if parent(v) is undefined then
parent(v)←w
send an accept-child message to w
else
send a reject-child message to w
until (v has received message done)
or (v=s and has pulsed-down n-1 times)
send a done message to all the nodes in C

Analysis
• n個節點，m個邊
 Local Spaced
 O(n2+m)

Minimum Spanning Tree
 利用Baruskal演算法找出MST所提出的有效率的序列式
 同步模式下的Baruskal分散式演算法
 決定出所有連通分量圖
 針對每個連通分量圖，找到具最小權重的邊
 加入到另一個分量圖

Baruskal Algorithm
KruskalMST(G):
Input: A simple connected weighted graph G
with n vertices and m edges
Output: A minimum spanning tree T for G
for each vertext v in G do
define an elementary cluster C(v)←{v}
initialize a priority queue Q to contain all edges in G,
using the weights as keys
T←ø

Baruskal Algorithm
while T has fewer than n-1 edges do
(u,v)←Q.removeMin()
Let C(v) be the cluster containing v ,
Let C(u) be the cluster containing u.
if C(v)≠C(u) then
Add edge(v,u) to T.
Merge C(v) and C(u) into one cluster,
that is union C(v) and C(u).
return tree T

Analysis
• n個節點，m個邊
 O(logn)
 Local Spaced
 O(m)
 O(mlogn)

Synchronization Algorithms
 Multicast
 Uses a central time server to synchronize clocks
 Cristian‟s algorithm (centralised)
 Berkeley algorithm (centralised)
 The Network Time Protocol (decentralised)

69

Cristian’s Algorithm(1989)
 使用time server來同步時間，且為保留供參考的時間
 Clients ask the time server for time
 period depends on maximum clock drift and accuracy required
 Clients receive the value and may:
 use it as it is
 add the known minimum network delay
 add half the time between this send and receive
 For links with symmetrical latency:
 RTT = resp.-received-time – req.-sent-time
 adjusted-local-time =
 server-timestamp + minimum network delay or
 server-timestamp + (RTT / 2) or
 server-timestamp + (RTT – server-latency) /2
 local-clock-error = adjusted-local-time – local-time

Berkeley algorithm (Gusella & Zatti, 1989)
 if no machines have receivers, …
 Berkeley algorithm uses a designated server to
synchronize

 The designated server polls or broadcasts
to all machines for their time,
adjusts times received for RTT & latency,
averages times, and tells each machine how to adjust.

 Polling is done using Cristian‟s algorithm

 Avg. time is more accurate, but still drifts

Network Time Protocol
 NTP is a best known and most widely implemented
decentralised algorithm
 Used for time synchronization on Internet

1 Primary server,
direct synchronization

Secondary server,
2 2 2
synchronized by
the primary server

3 3 3 3 3 3
Tertiary server,
synchronized by
www.ntp.org the secondary server

假設
 Each pair of processes is connected by reliable
channels (such as TCP).
 Messages are eventually delivered to recipients‟ input
buffer.
 Processes will not fail.
 There is agreement on how a resource is identified
 Pass identifier with requests

Exclusive Access Algorithm
 Centralized Algorithm
 Token Ring Algorithm
 Lamport Algorithm
(Timestamp Approach)
 Ricart & Agrawala Algorithm
 Leader Election Algorithms
 Bully Algorithm
 Ring Algorithm
 Chang&Roberts Algorithm
 Itai&Rodeh Algorithm

Centralized Algorithm
Operations Request(R
1. Request resource ) C
 Send request to coordinator to enter CS Grant(R)
2. Wait for response P
3. Receive grant Release(R)
 Grants permission to enter CS
 keeps a queue of requests to enter the CS.
4. access resource Coordinator
Queue of
5. Release resource
Requests 4
 Send release message to inform coordinator
2
 Safety, liveness and order are guaranteed Grant

Delay Request
P1 P4
 Client and Synchronization Release
 one round trip time (release + grant)
P2 P3

Token Ring Algorithm
Operations
 For each CS a token is used.
 Only the process holding the token can enter the CS.
 To exit the CS, the process sends the token onto its neighbor.
 If a process does not require to enter the CS when it receives the
token, it forwards the token to the next neighbor.
 在一個時間只會有一個程序取得Token，保證Mutual exclusion
 Order well-defined，讓Starvation不會發生
 假如token遺失 (e.g. process died)，將必須重新產生
 Safety & liveness are guaranteed, but ordering is not.
Delay
 Client : 0 to N message transmissions.
 Synchronization ：between one process‟s exit from the CS and the next
process‟s entry is between 1 and N message transmissions.

Lamport Algorithm
 A total ordering of requests is established by logical
timestamps.
 Each process maintains request Queue (mutual exclusion requests)
 Requesting CS, Pi
 multicasts “request” (i, Ti) to all processes (Ti is local Lamport time).
 Places request on its own queue
 waits until all processes “reply”
 Entering CS, Pi
 receives message (ack or release) from every other process with a
timestamp larger than Ti
 Releasing CS , Pi
 Remove request from its queue
 Send a timestamped release message
 This may cause its own entry have the earliest timestamp in the
queue, enabling it to access the critical section

Ricart & Agrawala Algorithm
 Using reliable multicast and logical clocks
 Process wants to enter critical section
 Compose message containing
 Identifier (machine ID, process ID)
 Name of resource
 Current time
 Send request to all processes ,wait until everyone gives permission
 When process receives request
 If receiver not interested →Send OK to sender
 If receiver is in critical section →Do not reply; add request to queue
 If receiver just sent a request as well:
 Compare timestamps: received & sent msgs→Earliest wins
 If receiver is loser then send OK else receiver is winner, do not reply, queue
 When done with critical section→Send OK to all queued requests

On initialization
state := RELEASED;
To enter the critical section
state := WANTED;
Multicast request to all processes; request processing deferred
here
T := request‟s timestamp;
Wait until (number of replies received = (N – 1));
state := HELD;
On receipt of a request <Ti, pi> at pj (i≠ j)
if (state = HELD) or ((state = WANTED) and ((T, pj) < (Ti, pi))
then queue request from pi without replying;
else reply immediately to pi;
To exit the critical section
state := RELEASED;
reply to any queued requests;

 Safety, liveness, and ordering are guaranteed.
 It takes 2(N-1) messages per entry operation (N-1 multicast
requests + N-1 replies); N messages if the underlying network
supports multicast. [3(N-1) in Lamport‟s algorithm]
Delay
 Client P3
 one round-trip time P1 P1 remains in
 Synchronization “wanted” until
P2 sends “reply”
 one message transmission time.

Reply

P2不能傳Reply給P1 P2 P2 message:
因為Timestamp →P1大於P2
Timestamp is 78

P2 Changes to “held” P1 message:

Timestamp is 87

Leader Election Algorithms
 Solution the problem
 N processes, may or may not have unique IDs (UIDs)
 for simplicity assume no crashes
 must choose unique master coordinator amongst processes
 Requirements
 Every process knows P, identity of leader, where P is unique
process id (usually maximum) or is yet undefined.
 All processes participate and eventually discover the identity
of the leader (cannot be undefined).
 When a coordinator fails, the algorithm must elect that active
process with the largest priority number
 兩種類型的演算法
 Bully: “the biggest guy in town wins”
 Ring: a logical, cyclic grouping

Bully Algorithm
 假設
 Synchronous system
 All messages arrive within Ttrans units of time.
 A reply is dispatched within Tprocess units of time of the receipt of a message.
 if no response is received in 2Ttrans + Tprocess, the node is assumed to be dead.

 若Process知道自己有最高的id，就會elect自己當Coordinator
且會傳送coordinator訊息給所有比其id低的其餘process
 當Process P注意到coordinator太久沒回應要求，就初始一個election
 當Process P拿到election就會傳送election訊息給其餘process
 若都沒人回應，P就會當Coordinator
 若有一個人有更higher numbered process回答，就結束P‟s job is done

Bully Algorithm
 Performce
 Best case scenario: The process with the second highest id
notices the failure of the coordinator and elects itself.
 N-2 coordinator messages are sent.
 Turnaround time is one message transmission time.
 Worst case scenario: When the process with the least id
detects the failure.
 N-1 processes altogether begin elections, each sending messages to
processes with higher ids.
 The message overhead is O(N2).
 Turnaround time is approximately 5 message transmission times.

Ring Algorithm
 No token is used in this algorithm
 當演算法結束時，任一Process分有Active清單(consisting of all the
priority numbers of all active processes in the system)
 若Process Pi偵測Coordinator failure，就會建立初始空白的Active
清單，之後傳送訊息elect(i)給Pi的right neighbor，和增加number i
到Pi的Active清單
 若Pi接收到訊訊elect(j)從左邊的Process，它必須有所回應
 If this is the first elect message it has seen or sent, Pi creates a new
active list with the numbers i and j and send the message elect(j)
 If i  j, then the active list for Pi now contains the numbers of all the
active processes in the system , Pi can now determine the largest
number in the active list to identify the new coordinator process
 If i = j, then Pi receives the message elect(i) , The active list for Pi
contains all the active processes in the system Pi can now determine
the new coordinator process.

Chang&Roberts Algorithm
 Assume
 Unidirectional ring
 Asynchronous system
 Each Process has UID

 Election
 initially each process non-participant
 determine leader (election message):
 initiator becomes participant and passes own UID on to neighbour
 when non-participant receives election message, forwards maximum
of own and the received UID and becomes participant
 participant does not forward the election message
 announce winner (elected message):
 when participant receives election message with own UID, becomes
leader and non-participant, and forwards UID in elected message
 otherwise, records the leader‟s UID, becomes non-participant and
forwards it

Itai&Rodeh Algorithm
 Assume
 Unidirectional ring
 Synchronous system
 Each Process not has UID

 Election
 each process selects ID at random from set {1,..K}
 non-unique! but fast
 process pass all IDs around the ring
 after one round, if there exists a unique ID then
elect maximum unique ID
 otherwise, repeat

 How do know the algorithm terminates?
 from probabilities:if you keep flipping a fair coin then after
several heads you must get tails

分散式系統

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie 分散式系統

Ähnlich wie 分散式系統 (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

分散式系統