Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

Roberto Baldoni Università di Roma “La Sapienza” Retirement Seminar for Professor Santosh Shrivastava 8 th of September 2011, Newcastle, U K The Price of Mastering Churn in Distributed Systems Roberto Baldoni, “The price of mastering churn in a distributed system”

Santosh reminds me … a set of acronims ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Roberto Baldoni, “The price of mastering churn in a distributed system” Large and promising IP rejected --too many Chinese! FET IP - Very strong consortium - rejected reason «very nice projects, however it wants to provide a real software platfom for pooling together on-demand resources in a multi-tenant environment resistant to byzantine attack…. in FET program we do not fund engineering work» Just below the bar!

Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],Roberto Baldoni, “The price of mastering churn in a distributed system”

Advent of Complex Distributed Applications ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Roberto Baldoni, “The price of mastering churn in a distributed system”

Managed vs. Unmanaged distributed applications (i) ,[object Object],[object Object],[object Object],[object Object],[object Object],Roberto Baldoni, “The price of mastering churn in a distributed system”

Managed Distributed Applications: Consequences ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],N entities N-1 entities N-2 N+3 time Roberto Baldoni, “The price of mastering churn in a distributed system”

Managed vs. Unmanaged distributed applications (ii) ,[object Object],[object Object],[object Object],[object Object],[object Object],Roberto Baldoni, “The price of mastering churn in a distributed system”

Unmanaged distributed applications: Consequences ,[object Object],[object Object],[object Object],[object Object],[object Object],Roberto Baldoni, “The price of mastering churn in a distributed system”

Spectrum of Possible System Models World Orderly Chaotic Static Managed Distributed Systems Dynamic Unmanaged Distributed Systems Roberto Baldoni, “The price of mastering churn in a distributed system” Air traffic Control Mobile ad-hoc Systems Cloud Computing Peer-to-peer

Uncertainty in Dynamic Distributed Systems ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Roberto Baldoni, “The price of mastering churn in a distributed system” ,[object Object],[object Object],[object Object]

System Model with Churn Roberto Baldoni, “The price of mastering churn in a distributed system”

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Roberto Baldoni, “The price of mastering churn in a distributed system” System Model with Churn

Abstractions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Roberto Baldoni, “The price of mastering churn in a distributed system”

Churn Distributed System Distributed Computation Connectivity Protocol Communication Protocols Abstraction Roberto Baldoni, “The price of mastering churn in a distributed system” For simplicity we assume N processes are in the distributed computation at any given time

Object Abstraction: The Regular Register A register is a shared variable accessed by processes through read and write operations Roberto Baldoni, “The price of mastering churn in a distributed system”

Regular Register Architecture at node i Roberto Baldoni, “The price of mastering churn in a distributed system” Connectivity Layer Point-to-Point Link Broadcast Regular Register If pi invokes the send(m) operation to pj at time t then pj will receive m by time t+  if it has not left the system by that time If pi invokes the broadcast(m) operation at time t and does not leave the system by time t+  then all the processes that are in the system at time t and does not leave the system by time t+  will deliver m by time t+  ,[object Object],[object Object],Read() write(v) join() REG System Computation

Regular Register: write() Roberto Baldoni, “The price of mastering churn in a distributed system” The writer process p w wants to write the value v p w sends a broadcast message (WRITE, v, sn) … in the meanwhile processes join and leave the computation OBS . Only processes belonging to the computation when p w starts the write and that remain in the computation for all the time of the write will maintain the updated copy of the register Active Processes keeps the state of the computation Distributed System A subset of processes participate to the register computation p w

Processes in the distributed computation vs Active Processes Roberto Baldoni, “The price of mastering churn in a distributed system” N Churn A(t) t Correctness bound #processes Joining processe=leaving processes

Processes in the distributed computation vs Active Processes Roberto Baldoni, “The price of mastering churn in a distributed system” N Churn A(t) t Correctness bound #processes Joining processe=leaving processes Movement of the bound is impacted by the system model. The weaker the system model is the more «static» the system becomes. This brings several impossibility results in presence of churn.

Processes in the distributed computation vs Active Processes N Churn A(t) t #processes Joining processe=leaving processes Correctness bound Liveness and Safety issues Roberto Baldoni, “The price of mastering churn in a distributed system” Movement of the bound is impacted by the system model. The weaker the system model is the more «static» the system becomes. This brings several impossibility results in presence of churn.

An Algorithm in Synchronous System ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Roberto Baldoni, “The price of mastering churn in a distributed system”

Synchronous System Safety: case register i ≠  Roberto Baldoni, “The price of mastering churn in a distributed system” Join()  0 0 0  1 p i p j p h p k ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],write (1) 1 1 1 WRITE(1, 1) Join Write Reply

Synchronous System Safety: case register i =  Roberto Baldoni, “The price of mastering churn in a distributed system” Join()   0 0 0  0 p i p j p h p k  INQUIRY(i) REPLY(h, 0, 0) If no write is concurrent with the join operation, and c<1/3  then there always exists an active process that replies with the last written value Join Write Reply

Synchronous System Safety: case register i =  Roberto Baldoni, “The price of mastering churn in a distributed system” write (1) Join()    0 0 0  1 1 1 p i p j p h p k  INQUIRY(i) REPLY(h, 0, 0) WRITE(1, 1) p i can receive both WRITE( < val,sn > ) messages and REPLY( < j, val, sn > ) messages. According the values received at time τ + 2 δ , p i will update register i to the value written by a concurrent update, or the value written before the concurrent writes WRITE(1, 1) If pi receives the write before the reply, pi does not overwrite the value and then any following write will return the last value written.

Synchronous System ,[object Object],[object Object],Roberto Baldoni, “The price of mastering churn in a distributed system”

Horizontal Quorums for Register Persistence Roberto Baldoni, “The price of mastering churn in a distributed system” 3 δ joining Active process Non-active process 1 5 9 3 1 5 9 8 1 5 7 8 2 5 7 8 2

Horizontal Quorums for Register Persistence Roberto Baldoni, “The price of mastering churn in a distributed system” 3 δ 3 δ joining joining Active process Non-active process ,[object Object],1 5 9 3 1 5 9 8 1 5 7 8 2 5 7 8 2 6 7 8 2 6 7 3 2 3

Eventually Synchronous System ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Roberto Baldoni, “The price of mastering churn in a distributed system”

Roberto Baldoni, “The price of mastering churn in a distributed system” Vertical Quorums for Register Validity in Asynchronous Periods ,[object Object],[object Object],time Termination. Let us assume that |A(t)| > n/ 2 (i.e., majority of processes is active at any time) , if a process invokes join(), read() or write (), and does not leave the system, it terminates its operation. Safety. Let us assume that |A(t)| > n/2, a read operation returns the last value written before the read invocation, or a value written by a write operation concurrent with i

Asynchronous System ,[object Object],[object Object],[object Object],[object Object],Roberto Baldoni, “The price of mastering churn in a distributed system”

Regular Register with Byzantine Failures Roberto Baldoni, “The price of mastering churn in a distributed system”

Regular Register with Byzantine Failures ,[object Object],[object Object],[object Object],[object Object],Connection Layer (e.g. Overlay Management Protocol) (Authenticated)Communication Layer (Best-effort Semantics) Distributed Computation (i.e. Regular Register)

Computation Model ,[object Object],[object Object],[object Object],Write (v) Read ()

Computation Model ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Write (v) Read () v v v v x v x v Join_Server() 

Requirements Write Persistency: Servers maintain the last value written by a write operation despite servers departures Byzantine Resiliency: There are always at least f+1 servers maintaining the same value Read- Validity: any read() operation returns the last value written by a completed write() or a value concurrently written

Issues in read() operations time t 1 t 2 t i t k v x x v  v x x v v v x x v v x x v  v v      y 

Validity Bound ,[object Object],[object Object],[object Object],Theorem : Let A JS , A R and A W be the algorithms implementing respectively join_Server(), read() and write() operations. Let  t j ,  t r and  t w be the maximum time intervals needed by the previous algorithm to terminate the operation. If c  min {(n-3f)/(n  t r ), (n-3f)/(n (  t j +  t w )} then it is not possible to ensure both write persistency and read validity

Validity Bound in a synchronous system ,[object Object],[object Object],Roberto Baldoni, “The price of mastering churn in a distributed system”

Pictorial Related Work and summary of results for Regular Register System Model Churn Model Failure model Asyncronous Eventually synchronous synchronous crash byzantine static quiescent continuous Aguilera et al. PODC 2010 Baldoni et al. ICDCS 2009 Baldoni et al. PODC 2011 Roberto Baldoni, “The price of mastering churn in a distributed system”

Pictorial Related Work and summary of results for Regular Register Roberto Baldoni, “The price of mastering churn in a distributed system” No Churn Quiescent Churn Continuous Churn Synch Crash BFT papers Baldoni et al ICDCS 2009 Byzant Baldoni et al. PODC 2011 (ba) Event Synch crash Baldoni et al ICDCS 2009 byzantine Open Problem Asynch Crash Aguillera et al 2009 Impossible byzant Open Problem

Other Abstractions we faced ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Roberto Baldoni, “The price of mastering churn in a distributed system”

Other Abstractions we faced ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Roberto Baldoni, “The price of mastering churn in a distributed system”

 done in 2 Steps ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Roberto Baldoni, “The price of mastering churn in a distributed system” leader alive list send/receive multicast/receive  HB* unicast multicast

Conclusion ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Roberto Baldoni, “The price of mastering churn in a distributed system”

One slide to remember Roberto Baldoni, “The price of mastering churn in a distributed system”

One slide to remember N Churn A(t) t #processes Joining processe=leaving processes Correctness bound Liveness and Safety issues Roberto Baldoni, “The price of mastering churn in a distributed system” Movement of the bound is impacted by the system model. The weaker the system model is the more «static» the system becomes. This brings several impossibility results in presence of churn.

Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (17)

Ähnlich wie Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

Ähnlich wie Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems (20)

Mehr von Roberto Baldoni

Mehr von Roberto Baldoni (6)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

Hinweis der Redaktion